Release Notes#
Schema Release Notes#
Version 1.5.1: June 2, 2024#
Version 1.5 patch release.
Corrected schema version number in the Info object.
Version 1.5.0: August 29, 2023#
Version 1.5 schema release.
General Schema Changes:
Fixed synchronization errors between the OpenAPI v2 and v3 versions of the AIRR Schema (airr-schema.yaml and airr-schema-openapi3.yaml).
Set the default value of
x-airr.miarr
attributes todefined
.Converted all
x-airr.format
attribute values to snake_case, which specifically impacts any instance ofcontrolled vocabulary
orphysical quantity
.Corrected numerous instances of missing x-airr.miairr and x-airr.identifier attributes.
Replaced
x-airr.adc-api-optional
attribute withx-airr.adc-query-support
in multiple fields.Added “IGI” as a valid value to the
locus
enum fields in multiple schema.Added
null
as a valide value to all nullable enum fields.Removed
discriminator: AIRR
from all object definitions.
Germline and Genotype Schema:
Clarified the descriptions of multiple fields in the Germline and Genotype schema.
Modified
x-airr: nullable
andx-airr: identifier
values on multiple fields in the Germline and Genotype schema.Removed the
alignment
field and added theunaligned_sequence
,aligned_sequences
, andalignment_labels
fields to theSequenceDelineationV
object.Converted the enum values in the
inference_type
field ofAlleleDescription
to snake_case.Added the
allele_similarity_cluster_designation
andallele_similarity_cluster_member_id
fields toAlleleDescription
.Moved the nested objects
DocumentedAllele
,UndocumentedAllele
, andDeletedGenes
out ofGenotype
and defined them as top-level objects references by thedocumented_alleles
,undocumented_alleles
, anddeleted_genes
fields, respectively.Moved the nested object
MHCAllele
out ofMHCGenotype
and defined it as a top-level object referenced by themhc_alleles
field.
Single-cell Schema:
Added the
property_type
field to theCellExpression
object.Moved the nested
ReceptorReactivity
object out ofReceptor
and defined it as a top-level object referenced by thereactivity_measurements
field.
Subject Schema:
Removed the nested references to
GenotypeSet
andMHCGenotypeSet
in thegenotype
field and modified the definition to point to a top-levelSubjectGenotype
object defining these references.
DataProcessing Schema:
Clarified the description of
quality_thresholds
to indicate that quality filtering is not mandatory.
Version 1.4.1: August 27, 2022#
Version 1.4 schema release.
New General Purpose Schema:
Introduced the experimental
DataFile
object, which defines a JSON file holding Repertoire metadata, data processing analysis objects, or any object in the AIRR Data Model.Introduced the experimental
RepertoireGroup
Schema for describing collections of repertoires to be analyzed together.Introduced the experimental
InfoObject
Schema, which provides information about data and ADC API responses.Introduced the experimental
TimePoint
Schema for defining the time point at which an observation or other action was performed.
New Germline and Genotype Schema:
The following experimental schema were introduced to support storage of VDJ germline reference sequences, VDJ genotypes, and MHC genotypes:
GermlineSet
: Defines a collection ofAlleleDescriptions
from the same strain or species.AlleleDescription
: Details of a putative or confirmed Ig receptor gene/allele inferred from one or more observations.RearrangedSequence
: Details of a directly observed rearranged sequence or an inference from rearranged sequences contributing support for a gene or allele.UnrearrangedSequence
: Details of an unrearranged sequence contributing support for a gene or allele.SequenceDelineationV
: Delineation of a V-gene in a particular system.GenotypeSet
: Defines a collection a VDJ genotypes for a given subject.Genotype
: Enumerates the alleles and gene deletions inferred in a single subject for a single locus.MHCGenotypeSet
: Defines a collection of MHC genotypes for a given subject.MHCGenotype
: Details the genotype of major histocompatibility complex (MHC) class I, class II and non-classical loci.Acknowledgement
: Defines contributors to the germline or genotype description.
New Single-cell Schema:
The following experimental schema were introduced to improve support
for single-cell data and extend the Cell
schema.
CellExpression
: Defines a container to store single-cell expression level measurements.Receptor
: Describes a complete receptor protein sequence and its reactivity.
Rearrangement Schema:
Added the optional fields
v_frameshift
,j_frameshift
,d_frame
andd2_frame
defining annotations related to alignment reading frames.Added the optional field
umi_count
to represent the count of distinct UMIs for a sequence.Modified the definition of
duplicate_count
to remove ambiguity with the newumi_count
field in a single-cell context. There is now a distinction between duplicate observed sequences (duplicate_count
) and UMIs (umi_count
).The optional
quality
andquality_alignment
alignment fields were added to store Phred quality scores for base calls in thesequence
andsequence_alignment
fields, respectively.The following optional fields were added to denote constant region (
c_call
) alignment positions:c_sequence_start
,c_sequence_end
,c_germline_start
,c_germline_end
,c_alignment_start
,c_alignment_end
.
Study Schema:
Added the optional fields
study_contact
to store contact information for the primary study contact.Modified the enumerated values supported by
keywords_study
to the following set:contains_ig
,contains_tr
,contains_paired_chain
,contains_schema_rearrangement
,contains_schema_clone
,contains_schema_cell
,contains_schema_receptor
Added the optional fields
adc_publish_date
andadc_update_data
that timestamp AIRR Data Commons initial publication and last update, respectively.
Subject Schema:
Added the optional
genotype
field linking to the newGenotypeSet
andMHCGenotypeSet
objects.
Sample Schema:
Added the required field
collection_time_point_relative_unit
defining the units for the sample collection timestamp.Modified the type of the field
collection_time_point_relative
from a string to a number defined in combination with the new unit ontology fieldcollection_time_point_relative_unit
.
NucleicAcidProcessing Schema:
Added the required field
template_amount_unit
defining the units for the input template quantification.Modified the type of the
template_amount
field from a string to a number defined in the combination with the new unit ontology field ``template_amount_unit`.
Clone Schema:
Added the optional
clone_count
field to specify absolute count of clonal members.Added the optional
umi_count
field to specify the total UMI count of all clonal members.
Cell Schema:
Removed the field
expression_tabular
whose functionality has been replaced by the newCellExpression
schema.
Version 1.3.1: October 13, 2020#
Version 1.3 documentation patch release.
Alignment Schema:
Added the deprecation tags for
rearrangement_id
, which were accidentally left out of the v1.3.0 release.
Version 1.3.0: May 28, 2020#
Version 1.3 schema release.
New Schema:
Introduced the
Repertoire
Schema for describing study meta data.Introduced the
PCRTarget
Schema for describing primer target locations.Introduced the
SampleProcessing
Schema for describing experimental processing steps for a sample.Replaced the
SoftwareProcessing
schema with theDataProcessing
schema.Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as
Clone
,Tree
,Node
, andCell
objects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airr
field. The newAttributes
object contains definitions for thesex-airr
field attributes.Added the top level
required
property to all relevant schema objects.Added the
title
attribute containing the short, descriptive name to all relevant schema object fields.Added an
example
attribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml
) for AIRR Data Commons API major version 1.
Ontology Support:
Added
Ontology
andCURIEResolution
objects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdj
field to annotate whether a V(D)J alignment was full length.Added the
junction_length_aa
field defining the length of the junction amino acid sequence.Added the
repertoire_id
,sample_processing_id
, anddata_processing_id
fields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locus
field:IGH
,IGI
,IGK
,IGL
,TRA
,TRB
,TRD
,TRG
.Deprecated the
rearrangement_set_id
andgermline_database
fields.Deprecated
rearrangement_id
field and made thesequence_id
field be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call
,d2_score
,d2_identity
,d2_support
,d2_cigar
np3
,np3_aa
,np3_length
,n3_length
,p5d2_length
,p3d2_length
,d2_sequence_start
,d2_sequence_end
,d2_germline_start
,d2_germline_start
,d2_alignment_start
,d2_alignment_end
,d2_sequence_alignment
,d2_sequence_alignment_aa
,d2_germline_alignment
,d2_germline_alignment_aa
.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_id
andgermline_database
fields.Added the
data_processing_id
field.
Study Schema:
Added the
study_type
field containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organism
field in favor of the newspecies
field.Deprecated the
age
field.Introduced age ranges:
age_min
,age_max
, andage_unit
.
Diagnosis Schema:
Changed the type of the
disease_diagnosis
field fromstring
toOntology
.
Sample Schema:
Changed the type of the
tissue
field fromstring
toOntology
.
CellProcessing Schema:
Changed the type of the
cell_subset
field fromstring
toOntology
.Introduced the
cell_species
field which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_class
field as typestring
.Added a controlled vocabulary the
library_generation_method
field.Changed the controlled vocabulary terms of
complete_sequences
. Replacingcomplete & untemplated
withcomplete+untemplated
and addingmixed
.Added the
pcr_target
field referencing the newPCRTarget
schema object.
SequencingRun Schema:
Added the
sequencing_run_id
field which serves as the object identifer field.Added the
sequencing_files
field which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_type
field defining the sequence file type. This field is a controlled vocabulary restricted to:fasta
,fastq
.Added the
paired_read_length
field defining mate-pair read lengths.Defined the
read_direction
andpaired_read_direction
fields as typestring
.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id
,primary_annotation
,data_processing_files
,germline_database
andanalysis_provenance_id
fields.
Version 1.2.1: Oct 5, 2018#
Minor patch release.
Schema gene vs segment terminology corrections
Added
Info
objectUpdated
cell_subset
URL in AIRR schema
Version 1.2.0: Aug 18, 2018#
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_id
fieldsIntroduced
keywords_study
field
Version 1.1.0: May 3, 2018#
Initial public released of the Rearrangement and Alignment schemas.
Added
required
andnullable
constrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airr
object indicating if field is required by MiAIRR.Rename
rearrangement_set_id
todata_processing_id
.Rename
study_description
tostudy_type
.Added
physical_quantity
format.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotation
andrepertoire_id
.Added
diagnosis
to repertoire object.Added ontology for
organism
.Added more detailed specification of
sequencing_run
,repertoire
andrearrangement
.Added repertoire schema.
Rename
definitions.yaml
toairr-schema.yaml
.Removed
c_call
,c_score
andc_cigar
from required as this is not typical reference aligner output.Renamed
vdj_score
,vdj_identity
,vdj_evalue
, andvdj_cigar
toscore
,identity
,evalue
, andcigar
.Added missing
c_identity
andc_evalue
fields toRearrangement
spec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangement
spec.Remove repeated objects in
definitions.yaml
.Added
Alignment
object todefinitions.yaml
.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018#
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.
Python Library Release Notes#
R Library Release Notes#
Version 1.5.0: August 29, 2023#
Updated schema set and examples to v1.5.
Version 1.4.1: August 27, 2022#
Significant internal refactoring to improve schema generalizability, harmonize behavior between the python and R libraries, and prepare for AIRR Standards v2.0.
Rearrangement:
Added the
aux_types
argument toread_tabular
,read_rearrangement
, andread_alignment
to allow explicit declaration of the type for fields that are not defined in the schema.Renamed
read_airr
,write_airr
, andvalidate_airr
toread_tabular
,validate_tabular
, andvalidate_tabular
, respectively.
Data Model and Schema:
Defined new
read_airr
,write_airr
, andvalidate_airr
functions that support AIRR Data Model files that store arrays of objects in JSON or YAML.Added support for the AIRR Model Data File and associated schema (DataFile, Info). The Data File data format holds AIRR object of multiple types and is backwards compatible with Repertoire metadata.
Added support for the new germline and genotyping schema (GermlineSet, GenotypeSet) and associated schema.
Version 1.3.0: May 26, 2020#
Updated schema set to v1.3.
Added
info
slot toSchema
object containing general schema information.
Version 1.2.0: August 17, 2018#
Updated schema set to v1.2.
Changed defaults to
base="1"
for read and write functions.Updated example TSV file with coordinate changes, addition of
germline_alignment
data and simplification ofsequence_id
values.
Version 1.1.0: May 1, 2018#
Initial release.