Schema Release Notes#
Version 1.5.1: June 2, 2024#
Version 1.5 patch release.
Corrected schema version number in the Info object.
Version 1.5.0: August 29, 2023#
Version 1.5 schema release.
General Schema Changes:
Fixed synchronization errors between the OpenAPI v2 and v3 versions of the AIRR Schema (airr-schema.yaml and airr-schema-openapi3.yaml).
Set the default value of
x-airr.miarrattributes todefined.Converted all
x-airr.formatattribute values to snake_case, which specifically impacts any instance ofcontrolled vocabularyorphysical quantity.Corrected numerous instances of missing x-airr.miairr and x-airr.identifier attributes.
Replaced
x-airr.adc-api-optionalattribute withx-airr.adc-query-supportin multiple fields.Added “IGI” as a valid value to the
locusenum fields in multiple schema.Added
nullas a valide value to all nullable enum fields.Removed
discriminator: AIRRfrom all object definitions.
Germline and Genotype Schema:
Clarified the descriptions of multiple fields in the Germline and Genotype schema.
Modified
x-airr: nullableandx-airr: identifiervalues on multiple fields in the Germline and Genotype schema.Removed the
alignmentfield and added theunaligned_sequence,aligned_sequences, andalignment_labelsfields to theSequenceDelineationVobject.Converted the enum values in the
inference_typefield ofAlleleDescriptionto snake_case.Added the
allele_similarity_cluster_designationandallele_similarity_cluster_member_idfields toAlleleDescription.Moved the nested objects
DocumentedAllele,UndocumentedAllele, andDeletedGenesout ofGenotypeand defined them as top-level objects references by thedocumented_alleles,undocumented_alleles, anddeleted_genesfields, respectively.Moved the nested object
MHCAlleleout ofMHCGenotypeand defined it as a top-level object referenced by themhc_allelesfield.
Single-cell Schema:
Added the
property_typefield to theCellExpressionobject.Moved the nested
ReceptorReactivityobject out ofReceptorand defined it as a top-level object referenced by thereactivity_measurementsfield.
Subject Schema:
Removed the nested references to
GenotypeSetandMHCGenotypeSetin thegenotypefield and modified the definition to point to a top-levelSubjectGenotypeobject defining these references.
DataProcessing Schema:
Clarified the description of
quality_thresholdsto indicate that quality filtering is not mandatory.
Version 1.4.1: August 27, 2022#
Version 1.4 schema release.
New General Purpose Schema:
Introduced the experimental
DataFileobject, which defines a JSON file holding Repertoire metadata, data processing analysis objects, or any object in the AIRR Data Model.Introduced the experimental
RepertoireGroupSchema for describing collections of repertoires to be analyzed together.Introduced the experimental
InfoObjectSchema, which provides information about data and ADC API responses.Introduced the experimental
TimePointSchema for defining the time point at which an observation or other action was performed.
New Germline and Genotype Schema:
The following experimental schema were introduced to support storage of VDJ germline reference sequences, VDJ genotypes, and MHC genotypes:
GermlineSet: Defines a collection ofAlleleDescriptionsfrom the same strain or species.AlleleDescription: Details of a putative or confirmed Ig receptor gene/allele inferred from one or more observations.RearrangedSequence: Details of a directly observed rearranged sequence or an inference from rearranged sequences contributing support for a gene or allele.UnrearrangedSequence: Details of an unrearranged sequence contributing support for a gene or allele.SequenceDelineationV: Delineation of a V-gene in a particular system.GenotypeSet: Defines a collection a VDJ genotypes for a given subject.Genotype: Enumerates the alleles and gene deletions inferred in a single subject for a single locus.MHCGenotypeSet: Defines a collection of MHC genotypes for a given subject.MHCGenotype: Details the genotype of major histocompatibility complex (MHC) class I, class II and non-classical loci.Acknowledgement: Defines contributors to the germline or genotype description.
New Single-cell Schema:
The following experimental schema were introduced to improve support
for single-cell data and extend the Cell schema.
CellExpression: Defines a container to store single-cell expression level measurements.Receptor: Describes a complete receptor protein sequence and its reactivity.
Rearrangement Schema:
Added the optional fields
v_frameshift,j_frameshift,d_frameandd2_framedefining annotations related to alignment reading frames.Added the optional field
umi_countto represent the count of distinct UMIs for a sequence.Modified the definition of
duplicate_countto remove ambiguity with the newumi_countfield in a single-cell context. There is now a distinction between duplicate observed sequences (duplicate_count) and UMIs (umi_count).The optional
qualityandquality_alignmentalignment fields were added to store Phred quality scores for base calls in thesequenceandsequence_alignmentfields, respectively.The following optional fields were added to denote constant region (
c_call) alignment positions:c_sequence_start,c_sequence_end,c_germline_start,c_germline_end,c_alignment_start,c_alignment_end.
Study Schema:
Added the optional fields
study_contactto store contact information for the primary study contact.Modified the enumerated values supported by
keywords_studyto the following set:contains_ig,contains_tr,contains_paired_chain,contains_schema_rearrangement,contains_schema_clone,contains_schema_cell,contains_schema_receptorAdded the optional fields
adc_publish_dateandadc_update_datathat timestamp AIRR Data Commons initial publication and last update, respectively.
Subject Schema:
Added the optional
genotypefield linking to the newGenotypeSetandMHCGenotypeSetobjects.
Sample Schema:
Added the required field
collection_time_point_relative_unitdefining the units for the sample collection timestamp.Modified the type of the field
collection_time_point_relativefrom a string to a number defined in combination with the new unit ontology fieldcollection_time_point_relative_unit.
NucleicAcidProcessing Schema:
Added the required field
template_amount_unitdefining the units for the input template quantification.Modified the type of the
template_amountfield from a string to a number defined in the combination with the new unit ontology field ``template_amount_unit`.
Clone Schema:
Added the optional
clone_countfield to specify absolute count of clonal members.Added the optional
umi_countfield to specify the total UMI count of all clonal members.
Cell Schema:
Removed the field
expression_tabularwhose functionality has been replaced by the newCellExpressionschema.
Version 1.3.1: October 13, 2020#
Version 1.3 documentation patch release.
Alignment Schema:
Added the deprecation tags for
rearrangement_id, which were accidentally left out of the v1.3.0 release.
Version 1.3.0: May 28, 2020#
Version 1.3 schema release.
New Schema:
Introduced the
RepertoireSchema for describing study meta data.Introduced the
PCRTargetSchema for describing primer target locations.Introduced the
SampleProcessingSchema for describing experimental processing steps for a sample.Replaced the
SoftwareProcessingschema with theDataProcessingschema.Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as
Clone,Tree,Node, andCellobjects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airrfield. The newAttributesobject contains definitions for thesex-airrfield attributes.Added the top level
requiredproperty to all relevant schema objects.Added the
titleattribute containing the short, descriptive name to all relevant schema object fields.Added an
exampleattribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml) for AIRR Data Commons API major version 1.
Ontology Support:
Added
OntologyandCURIEResolutionobjects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdjfield to annotate whether a V(D)J alignment was full length.Added the
junction_length_aafield defining the length of the junction amino acid sequence.Added the
repertoire_id,sample_processing_id, anddata_processing_idfields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locusfield:IGH,IGI,IGK,IGL,TRA,TRB,TRD,TRG.Deprecated the
rearrangement_set_idandgermline_databasefields.Deprecated
rearrangement_idfield and made thesequence_idfield be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call,d2_score,d2_identity,d2_support,d2_cigarnp3,np3_aa,np3_length,n3_length,p5d2_length,p3d2_length,d2_sequence_start,d2_sequence_end,d2_germline_start,d2_germline_start,d2_alignment_start,d2_alignment_end,d2_sequence_alignment,d2_sequence_alignment_aa,d2_germline_alignment,d2_germline_alignment_aa.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_idandgermline_databasefields.Added the
data_processing_idfield.
Study Schema:
Added the
study_typefield containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organismfield in favor of the newspeciesfield.Deprecated the
agefield.Introduced age ranges:
age_min,age_max, andage_unit.
Diagnosis Schema:
Changed the type of the
disease_diagnosisfield fromstringtoOntology.
Sample Schema:
Changed the type of the
tissuefield fromstringtoOntology.
CellProcessing Schema:
Changed the type of the
cell_subsetfield fromstringtoOntology.Introduced the
cell_speciesfield which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_classfield as typestring.Added a controlled vocabulary the
library_generation_methodfield.Changed the controlled vocabulary terms of
complete_sequences. Replacingcomplete & untemplatedwithcomplete+untemplatedand addingmixed.Added the
pcr_targetfield referencing the newPCRTargetschema object.
SequencingRun Schema:
Added the
sequencing_run_idfield which serves as the object identifer field.Added the
sequencing_filesfield which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_typefield defining the sequence file type. This field is a controlled vocabulary restricted to:fasta,fastq.Added the
paired_read_lengthfield defining mate-pair read lengths.Defined the
read_directionandpaired_read_directionfields as typestring.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id,primary_annotation,data_processing_files,germline_databaseandanalysis_provenance_idfields.
Version 1.2.1: Oct 5, 2018#
Minor patch release.
Schema gene vs segment terminology corrections
Added
InfoobjectUpdated
cell_subsetURL in AIRR schema
Version 1.2.0: Aug 18, 2018#
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_idfieldsIntroduced
keywords_studyfield
Version 1.1.0: May 3, 2018#
Initial public released of the Rearrangement and Alignment schemas.
Added
requiredandnullableconstrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airrobject indicating if field is required by MiAIRR.Rename
rearrangement_set_idtodata_processing_id.Rename
study_descriptiontostudy_type.Added
physical_quantityformat.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotationandrepertoire_id.Added
diagnosisto repertoire object.Added ontology for
organism.Added more detailed specification of
sequencing_run,repertoireandrearrangement.Added repertoire schema.
Rename
definitions.yamltoairr-schema.yaml.Removed
c_call,c_scoreandc_cigarfrom required as this is not typical reference aligner output.Renamed
vdj_score,vdj_identity,vdj_evalue, andvdj_cigartoscore,identity,evalue, andcigar.Added missing
c_identityandc_evaluefields toRearrangementspec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangementspec.Remove repeated objects in
definitions.yaml.Added
Alignmentobject todefinitions.yaml.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018#
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.