Schema Release Notes#
Version 2.0.0: June 5, 2026#
Version 2.0 major schema release.
General Schema Changes:
Reorganized time, physical quantities, and contributor tracking by introducing reusable component schemas:
TimeInterval,PhysicalQuantity,TimeQuantity,Contributor, andContributorContribution.Replaced the
Acknowledgementschema with the more robustContributorschema, which incorporates the CRediT contributor taxonomy roles and ROR institutional identifiers.Expanded
CURIEMapwith new prefixes for geographic locations and germline repositories:GAZ,IEDB_EPITOPE,IMGT_GERMLINESET, andOGRDB_GERMLINESET.Added the
adc-api-optionalattribute to theAttributesschema to designate optional API query fields.Introduced data package management schemas:
FileObject,DataSet, andManifestto support metadata bundling for groups of files.Updated the
DataFileschema to include arrays forNodeandManifestrecords, while removing direct nesting ofCellExpression.Removed the obsolete
Alignmentschema definition.
Time and Quantity Reorganization:
Simplified the
TimePointschema fields by shortening prefixes fromtime_point_totime_(e.g.,time_label,time_value, andtime_unit).Converted fields representing single numbers or strings with separate units into unified object references: +
Subject.agenow usesTimeInterval(replacingage_min,age_max, andage_unit).Diagnosis.disease_lengthnow usesTimeQuantity(replacing the generic string).Sample.collection_time_point_relativenow uses aTimePointobject reference (replacing separate number, unit, and reference fields).NucleicAcidProcessing.template_amountnow usesPhysicalQuantity(replacing separate number and unit fields).
Repertoire Schema:
Reorganized study authorship and contact tracking in the
Studyschema by introducing a unifiedcontributorsarray, deprecating individual contact fields (study_contact,collected_by,lab_name,lab_address, andsubmitted_by).Changed the type of
Study.pub_idsfrom a single string to an array of strings.Added a
repertoire_typefield to theRepertoireschema supporting a controlled vocabulary (observed,simulated,inferred,null).Added an optional
filterobject to theRepertoireFilterschema to document how rearrangements or cells were filtered using JSON structures consistent with the ADC API.Enhanced geographic and demographic tracking by converting
Subject.ancestry_populationto anOntologyreference and addingSubject.location_birthandSample.collection_locationas new ontology fields.
Rearrangement Schema:
Added the
locus_speciesontology field to support transgenic or chimeric models where the locus species differs from the host organism.Added a
rearrangement_typefield supporting a controlled vocabulary (observed,simulated,inferred,null).Added
reactivity_idandreactivity_reffields to link rearrangement records directly with single-cell reactivity data.Formalized the deprecation of
rearrangement_id(merged withsequence_id),rearrangement_set_id(replaced by specific identifiers), andgermline_database(moved toDataProcessing).
Single-cell Schema:
Renamed the
CellExpressionschema toExpressionand itsproperty_valuefield tovaluefor clarity and brevity.Renamed and overhauled the
ReceptorReactivityschema into a top-level single-cell data object calledReactivity, adding keys likereactivity_id,cell_id,repertoire_id, anddata_processing_idto link directly to individual cells and data processing records.Removed the
reactivity_measurementsblock from theReceptorschema, shifting reactivity tracking to the new top-levelReactivityobject.Updated the
Cellschema by addingcell_subset,cell_phenotype,cell_label, andcell_typefields, while removing the legacy rearrangements array and expression metadata fields.Added a
cell_labelfree text field to theCellProcessingschema for custom cell annotations not captured by standard ontologies.
Clone and Tree Schema:
Completely restructured the
Cloneschema to support multi-repertoire clonal analysis across an entireRepertoireGroup.Replaced clone-level alignment and sequence annotations (such as
v_call,d_call,j_call,junction, and individual coordinates) with a required list ofNoderecords and aninferred_ancestorreference.Embedded phylogenetic trees directly within the
Cloneobject as a Newick-formatted string field (tree), leading to the removal of the separateTreeschema.Redefined the
Nodeschema to serve as a link between a clone member and its original repertoire, specifying its source via mutually exclusivecell_idorsequence_idfields, along with descriptive properties likenode_typeandnode_class.
Version 1.6.0: July 7, 2025#
Version 1.6 schema release.
Added
minimumvalue to multiple numeric fields.Made
TimePointfieldslabel,valueandunitunique by prefixing withtime_point_Renamed
Acknowledgement.nametoindividual_full_name.Renamed
CellExpression.valuetoproperty_value.Moved several
RepertoireGroupfields into a separateRepertoireFilterobject.Fixed multiple type and example errors in the various schemas.
Version 1.5.1: June 2, 2024#
Version 1.5 patch release.
Corrected schema version number in the
Infoobject.
Version 1.5.0: August 29, 2023#
Version 1.5 schema release.
General Schema Changes:
Fixed synchronization errors between the OpenAPI v2 and v3 versions of the AIRR Schema (airr-schema.yaml and airr-schema-openapi3.yaml).
Set the default value of
x-airr.miarrattributes todefined.Converted all
x-airr.formatattribute values to snake_case, which specifically impacts any instance ofcontrolled vocabularyorphysical quantity.Corrected numerous instances of missing x-airr.miairr and x-airr.identifier attributes.
Replaced
x-airr.adc-api-optionalattribute withx-airr.adc-query-supportin multiple fields.Added “IGI” as a valid value to the
locusenum fields in multiple schema.Added
nullas a valide value to all nullable enum fields.Removed
discriminator: AIRRfrom all object definitions.
Germline and Genotype Schema:
Clarified the descriptions of multiple fields in the Germline and Genotype schema.
Modified
x-airr: nullableandx-airr: identifiervalues on multiple fields in the Germline and Genotype schema.Removed the
alignmentfield and added theunaligned_sequence,aligned_sequences, andalignment_labelsfields to theSequenceDelineationVobject.Converted the enum values in the
inference_typefield ofAlleleDescriptionto snake_case.Added the
allele_similarity_cluster_designationandallele_similarity_cluster_member_idfields toAlleleDescription.Moved the nested objects
DocumentedAllele,UndocumentedAllele, andDeletedGenesout ofGenotypeand defined them as top-level objects references by thedocumented_alleles,undocumented_alleles, anddeleted_genesfields, respectively.Moved the nested object
MHCAlleleout ofMHCGenotypeand defined it as a top-level object referenced by themhc_allelesfield.
Single-cell Schema:
Added the
property_typefield to theCellExpressionobject.Moved the nested
ReceptorReactivityobject out ofReceptorand defined it as a top-level object referenced by thereactivity_measurementsfield.
Subject Schema:
Removed the nested references to
GenotypeSetandMHCGenotypeSetin thegenotypefield and modified the definition to point to a top-levelSubjectGenotypeobject defining these references.
DataProcessing Schema:
Clarified the description of
quality_thresholdsto indicate that quality filtering is not mandatory.
Version 1.4.1: August 27, 2022#
Version 1.4 schema release.
New General Purpose Schema:
Introduced the experimental
DataFileobject, which defines a JSON file holding Repertoire metadata, data processing analysis objects, or any object in the AIRR Data Model.Introduced the experimental
RepertoireGroupSchema for describing collections of repertoires to be analyzed together.Introduced the experimental
InfoObjectSchema, which provides information about data and ADC API responses.Introduced the experimental
TimePointSchema for defining the time point at which an observation or other action was performed.
New Germline and Genotype Schema:
The following experimental schema were introduced to support storage of VDJ germline reference sequences, VDJ genotypes, and MHC genotypes:
GermlineSet: Defines a collection ofAlleleDescriptionsfrom the same strain or species.AlleleDescription: Details of a putative or confirmed Ig receptor gene/allele inferred from one or more observations.RearrangedSequence: Details of a directly observed rearranged sequence or an inference from rearranged sequences contributing support for a gene or allele.UnrearrangedSequence: Details of an unrearranged sequence contributing support for a gene or allele.SequenceDelineationV: Delineation of a V-gene in a particular system.GenotypeSet: Defines a collection a VDJ genotypes for a given subject.Genotype: Enumerates the alleles and gene deletions inferred in a single subject for a single locus.MHCGenotypeSet: Defines a collection of MHC genotypes for a given subject.MHCGenotype: Details the genotype of major histocompatibility complex (MHC) class I, class II and non-classical loci.Acknowledgement: Defines contributors to the germline or genotype description.
New Single-cell Schema:
The following experimental schema were introduced to improve support
for single-cell data and extend the Cell schema.
CellExpression: Defines a container to store single-cell expression level measurements.Receptor: Describes a complete receptor protein sequence and its reactivity.
Rearrangement Schema:
Added the optional fields
v_frameshift,j_frameshift,d_frameandd2_framedefining annotations related to alignment reading frames.Added the optional field
umi_countto represent the count of distinct UMIs for a sequence.Modified the definition of
duplicate_countto remove ambiguity with the newumi_countfield in a single-cell context. There is now a distinction between duplicate observed sequences (duplicate_count) and UMIs (umi_count).The optional
qualityandquality_alignmentalignment fields were added to store Phred quality scores for base calls in thesequenceandsequence_alignmentfields, respectively.The following optional fields were added to denote constant region (
c_call) alignment positions:c_sequence_start,c_sequence_end,c_germline_start,c_germline_end,c_alignment_start,c_alignment_end.
Study Schema:
Added the optional fields
study_contactto store contact information for the primary study contact.Modified the enumerated values supported by
keywords_studyto the following set:contains_ig,contains_tr,contains_paired_chain,contains_schema_rearrangement,contains_schema_clone,contains_schema_cell,contains_schema_receptorAdded the optional fields
adc_publish_dateandadc_update_datathat timestamp AIRR Data Commons initial publication and last update, respectively.
Subject Schema:
Added the optional
genotypefield linking to the newGenotypeSetandMHCGenotypeSetobjects.
Sample Schema:
Added the required field
collection_time_point_relative_unitdefining the units for the sample collection timestamp.Modified the type of the field
collection_time_point_relativefrom a string to a number defined in combination with the new unit ontology fieldcollection_time_point_relative_unit.
NucleicAcidProcessing Schema:
Added the required field
template_amount_unitdefining the units for the input template quantification.Modified the type of the
template_amountfield from a string to a number defined in the combination with the new unit ontology field ``template_amount_unit`.
Clone Schema:
Added the optional
clone_countfield to specify absolute count of clonal members.Added the optional
umi_countfield to specify the total UMI count of all clonal members.
Cell Schema:
Removed the field
expression_tabularwhose functionality has been replaced by the newCellExpressionschema.
Version 1.3.1: October 13, 2020#
Version 1.3 documentation patch release.
Alignment Schema:
Added the deprecation tags for
rearrangement_id, which were accidentally left out of the v1.3.0 release.
Version 1.3.0: May 28, 2020#
Version 1.3 schema release.
New Schema:
Introduced the
RepertoireSchema for describing study meta data.Introduced the
PCRTargetSchema for describing primer target locations.Introduced the
SampleProcessingSchema for describing experimental processing steps for a sample.Replaced the
SoftwareProcessingschema with theDataProcessingschema.Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as
Clone,Tree,Node, andCellobjects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airrfield. The newAttributesobject contains definitions for thesex-airrfield attributes.Added the top level
requiredproperty to all relevant schema objects.Added the
titleattribute containing the short, descriptive name to all relevant schema object fields.Added an
exampleattribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml) for AIRR Data Commons API major version 1.
Ontology Support:
Added
OntologyandCURIEResolutionobjects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdjfield to annotate whether a V(D)J alignment was full length.Added the
junction_length_aafield defining the length of the junction amino acid sequence.Added the
repertoire_id,sample_processing_id, anddata_processing_idfields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locusfield:IGH,IGI,IGK,IGL,TRA,TRB,TRD,TRG.Deprecated the
rearrangement_set_idandgermline_databasefields.Deprecated
rearrangement_idfield and made thesequence_idfield be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call,d2_score,d2_identity,d2_support,d2_cigarnp3,np3_aa,np3_length,n3_length,p5d2_length,p3d2_length,d2_sequence_start,d2_sequence_end,d2_germline_start,d2_germline_start,d2_alignment_start,d2_alignment_end,d2_sequence_alignment,d2_sequence_alignment_aa,d2_germline_alignment,d2_germline_alignment_aa.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_idandgermline_databasefields.Added the
data_processing_idfield.
Study Schema:
Added the
study_typefield containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organismfield in favor of the newspeciesfield.Deprecated the
agefield.Introduced age ranges:
age_min,age_max, andage_unit.
Diagnosis Schema:
Changed the type of the
disease_diagnosisfield fromstringtoOntology.
Sample Schema:
Changed the type of the
tissuefield fromstringtoOntology.
CellProcessing Schema:
Changed the type of the
cell_subsetfield fromstringtoOntology.Introduced the
cell_speciesfield which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_classfield as typestring.Added a controlled vocabulary the
library_generation_methodfield.Changed the controlled vocabulary terms of
complete_sequences. Replacingcomplete & untemplatedwithcomplete+untemplatedand addingmixed.Added the
pcr_targetfield referencing the newPCRTargetschema object.
SequencingRun Schema:
Added the
sequencing_run_idfield which serves as the object identifer field.Added the
sequencing_filesfield which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_typefield defining the sequence file type. This field is a controlled vocabulary restricted to:fasta,fastq.Added the
paired_read_lengthfield defining mate-pair read lengths.Defined the
read_directionandpaired_read_directionfields as typestring.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id,primary_annotation,data_processing_files,germline_databaseandanalysis_provenance_idfields.
Version 1.2.1: Oct 5, 2018#
Minor patch release.
Schema gene vs segment terminology corrections
Added
InfoobjectUpdated
cell_subsetURL in AIRR schema
Version 1.2.0: Aug 18, 2018#
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_idfieldsIntroduced
keywords_studyfield
Version 1.1.0: May 3, 2018#
Initial public released of the Rearrangement and Alignment schemas.
Added
requiredandnullableconstrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airrobject indicating if field is required by MiAIRR.Rename
rearrangement_set_idtodata_processing_id.Rename
study_descriptiontostudy_type.Added
physical_quantityformat.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotationandrepertoire_id.Added
diagnosisto repertoire object.Added ontology for
organism.Added more detailed specification of
sequencing_run,repertoireandrearrangement.Added repertoire schema.
Rename
definitions.yamltoairr-schema.yaml.Removed
c_call,c_scoreandc_cigarfrom required as this is not typical reference aligner output.Renamed
vdj_score,vdj_identity,vdj_evalue, andvdj_cigartoscore,identity,evalue, andcigar.Added missing
c_identityandc_evaluefields toRearrangementspec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangementspec.Remove repeated objects in
definitions.yaml.Added
Alignmentobject todefinitions.yaml.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018#
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.