Release Notes

Schema Release Notes

Version 1.3.1: October 13, 2020

Version 1.3 documentation patch release.

Alignment Schema:

  1. Added the deprecation tags for rearrangement_id, which were accidentally left out of the v1.3.0 release.

Version 1.3.0: May 28, 2020

Version 1.3 schema release.

New Schema:

  1. Introduced the Repertoire Schema for describing study meta data.

  2. Introduced the PCRTarget Schema for describing primer target locations.

  3. Introduced the SampleProcessing Schema for describing experimental processing steps for a sample.

  4. Replaced the SoftwareProcessing schema with the DataProcessing schema.

  5. Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as Clone, Tree, Node, and Cell objects, respectively.

General Updates:

  1. Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the x-airr field. The new Attributes object contains definitions for these x-airr field attributes.

  2. Added the top level required property to all relevant schema objects.

  3. Added the title attribute containing the short, descriptive name to all relevant schema object fields.

  4. Added an example attribute containing an example data value to multiple schema object fields.

AIRR Data Commons API:

  1. Added OpenAPI V2 specification (specs/adc-api.yaml) for AIRR Data Commons API major version 1.

Ontology Support:

  1. Added Ontology and CURIEResolution objects to support ontologies.

  2. Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.

Rearrangement Schema:

  1. Added the complete_vdj field to annotate whether a V(D)J alignment was full length.

  2. Added the junction_length_aa field defining the length of the junction amino acid sequence.

  3. Added the repertoire_id, sample_processing_id, and data_processing_id fields to serve as linkers to the appropriate metadata objects.

  4. Added a controlled vocabulary to the locus field: IGH, IGI, IGK, IGL, TRA, TRB, TRD, TRG.

  5. Deprecated the rearrangement_set_id and germline_database fields.

  6. Deprecated rearrangement_id field and made the sequence_id field be the primary unique identifer for a rearrangement record, both in files and data repositories.

  7. Added support secondary D gene rearrangement through the additional fields: d2_call, d2_score, d2_identity, d2_support, d2_cigar np3, np3_aa, np3_length, n3_length, p5d2_length, p3d2_length, d2_sequence_start, d2_sequence_end, d2_germline_start, d2_germline_start, d2_alignment_start, d2_alignment_end, d2_sequence_alignment, d2_sequence_alignment_aa, d2_germline_alignment, d2_germline_alignment_aa.

  8. Updated field definitions with more concise V(D)J call descriptions.

Alignment Schema:

  1. Deprecated the rearrangement_set_id and germline_database fields.

  2. Added the data_processing_id field.

Study Schema:

  1. Added the study_type field containing an ontology defined term for the study design.

Subject Schema:

  1. Deprecated the organism field in favor of the new species field.

  2. Deprecated the age field.

  3. Introduced age ranges: age_min, age_max, and age_unit.

Diagnosis Schema:

  1. Changed the type of the disease_diagnosis field from string to Ontology.

Sample Schema:

  1. Changed the type of the tissue field from string to Ontology.

CellProcessing Schema:

  1. Changed the type of the cell_subset field from string to Ontology.

  2. Introduced the cell_species field which denotes the species from which the analyzed cells originate.

NucleicAcidProcessing Schema:

  1. Defined the template_class field as type string.

  2. Added a controlled vocabulary the library_generation_method field.

  3. Changed the controlled vocabulary terms of complete_sequences. Replacing complete & untemplated with complete+untemplated and adding mixed.

  4. Added the pcr_target field referencing the new PCRTarget schema object.

SequencingRun Schema:

  1. Added the sequencing_run_id field which serves as the object identifer field.

  2. Added the sequencing_files field which links to the RawSequenceData schema objects defining the raw read data.

RawSequenceData Schema:

  1. Added the file_type field defining the sequence file type. This field is a controlled vocabulary restricted to: fasta, fastq.

  2. Added the paired_read_length field defining mate-pair read lengths.

  3. Defined the read_direction and paired_read_direction fields as type string.

DataProcessing Schema:

  1. Replaces the SoftwareProcessing object.

  2. Added data_processing_id, primary_annotation, data_processing_files, germline_database and analysis_provenance_id fields.

Version 1.2.1: Oct 5, 2018

Minor patch release.

  1. Schema gene vs segment terminology corrections

  2. Added Info object

  3. Updated cell_subset URL in AIRR schema

Version 1.2.0: Aug 18, 2018

Peer reviewed released of the Rearrangement schema.

  1. Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).

  2. Removed foreign study_id fields

  3. Introduced keywords_study field

Version 1.1.0: May 3, 2018

Initial public released of the Rearrangement and Alignment schemas.

  1. Added required and nullable constrains to AIRR schema.

  2. Schema definitions for MiAIRR attributes and ontology.

  3. Introduction of an x-airr object indicating if field is required by MiAIRR.

  4. Rename rearrangement_set_id to data_processing_id.

  5. Rename study_description to study_type.

  6. Added physical_quantity format.

  7. Raw sequencing files into separate schema object.

  8. Rename Attributes object.

  9. Added primary_annotation and repertoire_id.

  10. Added diagnosis to repertoire object.

  11. Added ontology for organism.

  12. Added more detailed specification of sequencing_run, repertoire and rearrangement.

  13. Added repertoire schema.

  14. Rename definitions.yaml to airr-schema.yaml.

  15. Removed c_call, c_score and c_cigar from required as this is not typical reference aligner output.

  16. Renamed vdj_score, vdj_identity, vdj_evalue, and vdj_cigar to score, identity, evalue, and cigar.

  17. Added missing c_identity and c_evalue fields to Rearrangement spec.

  18. Swapped order of N and S operators in CIGAR string.

  19. Some description clean up for consistency in Rearrangement spec.

  20. Remove repeated objects in definitions.yaml.

  21. Added Alignment object to definitions.yaml.

  22. Updated MiARR format consistency check TSV with junction change.

  23. Changed definition from functional to productive.

Version 1.0.1: Jan 9, 2018

MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.

Python Library Release Notes

Version 1.3.1: October 13, 2020

  1. Refactored merge_rearrangement to allow for larger number of files.

  2. Improved error handling in format validation operations.

Version 1.3.0: May 30, 2020

  1. Updated schema set to v1.3.

  2. Added load_repertoire, write_repertoire, and validate_repertoire to airr.interface to read, write and validate Repertoire metadata, respectively.

  3. Added repertoire_template to airr.interface which will return a complete repertoire object where all fields have null values.

  4. Added validate_object to airr.schema that will validate a single repertoire object against the schema.

  5. Extended the airr-tools commandline program to validate both rearrangement and repertoire files.

Version 1.2.1: October 5, 2018

  1. Fixed a bug in the python reference library causing start coordinate values to be empty in some cases when writing data.

Version 1.2.0: August 17, 2018

  1. Updated schema set to v1.2.

  2. Several improvements to the validate_rearrangement function.

  3. Changed behavior of all airr.interface functions to accept a file path (string) to a single Rearrangement TSV, instead of requiring a file handle as input.

  4. Added base argument to RearrangementReader and RearrangementWriter to support optional conversion of 1-based closed intervals in the TSV to python-style 0-based half-open intervals. Defaults to conversion.

  5. Added the custom exception ValidationError for handling validation checks.

  6. Added the validate argument to RearrangementReader which will raise a ValidationError exception when reading files with missing required fields or invalid values for known field types.

  7. Added validate argument to all type conversion methods in Schema, which will now raise a ValidationError exception for value that cannot be converted when set to True. When set False (default), the previous behavior of assigning None as the converted value is retained.

  8. Added validate_header and validate_row methods to Schema and removed validations methods from RearrangementReader.

  9. Removed automatic closure of file handle upon reaching the iterator end in RearrangementReader.

Version 1.1.0: May 1, 2018

Initial release.

R Library Release Notes

Version 1.3.0: May 26, 2020

  1. Updated schema set to v1.3.

  2. Added info slot to Schema object containing general schema information.

Version 1.2.0: August 17, 2018

  1. Updated schema set to v1.2.

  2. Changed defaults to base="1" for read and write functions.

  3. Updated example TSV file with coordinate changes, addition of germline_alignment data and simplification of sequence_id values.

Version 1.1.0: May 1, 2018

Initial release.