Schema Release Notes
Contents
Schema Release Notes#
Version 1.4.1: August 27, 2022#
Version 1.4 schema release.
New General Purpose Schema:
Introduced the experimental
DataFile
object, which defines a JSON file holding Repertoire metadata, data processing analysis objects, or any object in the AIRR Data Model.Introduced the experimental
RepertoireGroup
Schema for describing collections of repertoires to be analyzed together.Introduced the experimental
InfoObject
Schema, which provides information about data and ADC API responses.Introduced the experimental
TimePoint
Schema for defining the time point at which an observation or other action was performed.
New Germline and Genotype Schema:
The following experimental schema were introduced to support storage of VDJ germline reference sequences, VDJ genotypes, and MHC genotypes:
GermlineSet
: Defines a collection ofAlleleDescriptions
from the same strain or species.AlleleDescription
: Details of a putative or confirmed Ig receptor gene/allele inferred from one or more observations.RearrangedSequence
: Details of a directly observed rearranged sequence or an inference from rearranged sequences contributing support for a gene or allele.UnrearrangedSequence
: Details of an unrearranged sequence contributing support for a gene or allele.SequenceDelineationV
: Delineation of a V-gene in a particular system.GenotypeSet
: Defines a collection a VDJ genotypes for a given subject.Genotype
: Enumerates the alleles and gene deletions inferred in a single subject for a single locus.MHCGenotypeSet
: Defines a collection of MHC genotypes for a given subject.MHCGenotype
: Details the genotype of major histocompatibility complex (MHC) class I, class II and non-classical loci.Acknowledgement
: Defines contributors to the germline or genotype description.
New Single-cell Schema:
The following experimental schema were introduced to improve support
for single-cell data and extend the Cell
schema.
CellExpression
: Defines a container to store single-cell expression level measurements.Receptor
: Describes a complete receptor protein sequence and its reactivity.
Rearrangement Schema:
Added the optional fields
v_frameshift
,j_frameshift
,d_frame
andd2_frame
defining annotations related to alignment reading frames.Added the optional field
umi_count
to represent the count of distinct UMIs for a sequence.Modified the definition of
duplicate_count
to remove ambiguity with the newumi_count
field in a single-cell context. There is now a distinction between duplicate observed sequences (duplicate_count
) and UMIs (umi_count
).The optional
quality
andquality_alignment
alignment fields were added to store Phred quality scores for base calls in thesequence
andsequence_alignment
fields, respectively.The following optional fields were added to denote constant region (
c_call
) alignment positions:c_sequence_start
,c_sequence_end
,c_germline_start
,c_germline_end
,c_alignment_start
,c_alignment_end
.
Study Schema:
Added the optional fields
study_contact
to store contact information for the primary study contact.Modified the enumerated values supported by
keywords_study
to the following set:contains_ig
,contains_tr
,contains_paired_chain
,contains_schema_rearrangement
,contains_schema_clone
,contains_schema_cell
,contains_schema_receptor
Added the optional fields
adc_publish_date
andadc_update_data
that timestamp AIRR Data Commons initial publication and last update, respectively.
Subject Schema:
Added the optional
genotype
field linking to the newGenotypeSet
andMHCGenotypeSet
objects.
Sample Schema:
Added the required field
collection_time_point_relative_unit
defining the units for the sample collection timestamp.Modified the type of the field
collection_time_point_relative
from a string to a number defined in combination with the new unit ontology fieldcollection_time_point_relative_unit
.
NucleicAcidProcessing Schema:
Added the required field
template_amount_unit
defining the units for the input template quantification.Modified the type of the
template_amount
field from a string to a number defined in the combination with the new unit ontology field ``template_amount_unit`.
Clone Schema:
Added the optional
clone_count
field to specify absolute count of clonal members.Added the optional
umi_count
field to specify the total UMI count of all clonal members.
Cell Schema:
Removed the field
expression_tabular
whose functionality has been replaced by the newCellExpression
schema.
Version 1.3.1: October 13, 2020#
Version 1.3 documentation patch release.
Alignment Schema:
Added the deprecation tags for
rearrangement_id
, which were accidentally left out of the v1.3.0 release.
Version 1.3.0: May 28, 2020#
Version 1.3 schema release.
New Schema:
Introduced the
Repertoire
Schema for describing study meta data.Introduced the
PCRTarget
Schema for describing primer target locations.Introduced the
SampleProcessing
Schema for describing experimental processing steps for a sample.Replaced the
SoftwareProcessing
schema with theDataProcessing
schema.Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as
Clone
,Tree
,Node
, andCell
objects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airr
field. The newAttributes
object contains definitions for thesex-airr
field attributes.Added the top level
required
property to all relevant schema objects.Added the
title
attribute containing the short, descriptive name to all relevant schema object fields.Added an
example
attribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml
) for AIRR Data Commons API major version 1.
Ontology Support:
Added
Ontology
andCURIEResolution
objects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdj
field to annotate whether a V(D)J alignment was full length.Added the
junction_length_aa
field defining the length of the junction amino acid sequence.Added the
repertoire_id
,sample_processing_id
, anddata_processing_id
fields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locus
field:IGH
,IGI
,IGK
,IGL
,TRA
,TRB
,TRD
,TRG
.Deprecated the
rearrangement_set_id
andgermline_database
fields.Deprecated
rearrangement_id
field and made thesequence_id
field be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call
,d2_score
,d2_identity
,d2_support
,d2_cigar
np3
,np3_aa
,np3_length
,n3_length
,p5d2_length
,p3d2_length
,d2_sequence_start
,d2_sequence_end
,d2_germline_start
,d2_germline_start
,d2_alignment_start
,d2_alignment_end
,d2_sequence_alignment
,d2_sequence_alignment_aa
,d2_germline_alignment
,d2_germline_alignment_aa
.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_id
andgermline_database
fields.Added the
data_processing_id
field.
Study Schema:
Added the
study_type
field containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organism
field in favor of the newspecies
field.Deprecated the
age
field.Introduced age ranges:
age_min
,age_max
, andage_unit
.
Diagnosis Schema:
Changed the type of the
disease_diagnosis
field fromstring
toOntology
.
Sample Schema:
Changed the type of the
tissue
field fromstring
toOntology
.
CellProcessing Schema:
Changed the type of the
cell_subset
field fromstring
toOntology
.Introduced the
cell_species
field which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_class
field as typestring
.Added a controlled vocabulary the
library_generation_method
field.Changed the controlled vocabulary terms of
complete_sequences
. Replacingcomplete & untemplated
withcomplete+untemplated
and addingmixed
.Added the
pcr_target
field referencing the newPCRTarget
schema object.
SequencingRun Schema:
Added the
sequencing_run_id
field which serves as the object identifer field.Added the
sequencing_files
field which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_type
field defining the sequence file type. This field is a controlled vocabulary restricted to:fasta
,fastq
.Added the
paired_read_length
field defining mate-pair read lengths.Defined the
read_direction
andpaired_read_direction
fields as typestring
.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id
,primary_annotation
,data_processing_files
,germline_database
andanalysis_provenance_id
fields.
Version 1.2.1: Oct 5, 2018#
Minor patch release.
Schema gene vs segment terminology corrections
Added
Info
objectUpdated
cell_subset
URL in AIRR schema
Version 1.2.0: Aug 18, 2018#
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_id
fieldsIntroduced
keywords_study
field
Version 1.1.0: May 3, 2018#
Initial public released of the Rearrangement and Alignment schemas.
Added
required
andnullable
constrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airr
object indicating if field is required by MiAIRR.Rename
rearrangement_set_id
todata_processing_id
.Rename
study_description
tostudy_type
.Added
physical_quantity
format.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotation
andrepertoire_id
.Added
diagnosis
to repertoire object.Added ontology for
organism
.Added more detailed specification of
sequencing_run
,repertoire
andrearrangement
.Added repertoire schema.
Rename
definitions.yaml
toairr-schema.yaml
.Removed
c_call
,c_score
andc_cigar
from required as this is not typical reference aligner output.Renamed
vdj_score
,vdj_identity
,vdj_evalue
, andvdj_cigar
toscore
,identity
,evalue
, andcigar
.Added missing
c_identity
andc_evalue
fields toRearrangement
spec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangement
spec.Remove repeated objects in
definitions.yaml
.Added
Alignment
object todefinitions.yaml
.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018#
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.