Schema Release Notes¶
Version 1.3.0: May 28, 2020¶
Version 1.3 schema release.
New Schema:
Introduced the
RepertoireSchema for describing study meta data.Introduced the PCRTarget Schema for describing primer target locations.
Introduced the SampleProcessing Schema for describing experimental processing steps for a sample.
Replaced the SoftwareProcessing schema with the DataProcessing schema.
Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as Clone, Tree, Node, and Cell objects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airrfield. The newAttributesobject contains definitions for thesex-airrfield attributes.Added the top level
requiredproperty to all relevant schema objects.Added the
titleattribute containing the short, descriptive name to all relevant schema object fields.Added an
exampleattribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml) for AIRR Data Commons API major version 1.
Ontology Support:
Added
OntologyandCURIEResolutionobjects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdjfield to annotate whether a V(D)J alignment was full length.Added the
junction_length_aafield defining the length of the junction amino acid sequence.Added the
repertoire_id,sample_processing_id, anddata_processing_idfields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locusfield:IGH,IGI,IGK,IGL,TRA,TRB,TRD,TRG.Deprecated the
rearrangement_set_idandgermline_databasefields.Deprecated
rearrangement_idfield and made thesequence_idfield be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call,d2_score,d2_identity,d2_support,d2_cigarnp3,np3_aa,np3_length,n3_length,p5d2_length,p3d2_length,d2_sequence_start,d2_sequence_end,d2_germline_start,d2_germline_start,d2_alignment_start,d2_alignment_end,d2_sequence_alignment,d2_sequence_alignment_aa,d2_germline_alignment,d2_germline_alignment_aa.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_idandgermline_databasefields.Added the
data_processing_idfield.
Study Schema:
Added the
study_typefield containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organismfield in favor of the newspeciesfield.Deprecated the
agefield.Introduced age ranges:
age_min,age_max, andage_unit.
Diagnosis Schema:
Changed the type of the
disease_diagnosisfield fromstringtoOntology.
Sample Schema:
Changed the type of the
tissuefield fromstringtoOntology.
CellProcessing Schema:
Changed the type of the
cell_subsetfield fromstringtoOntology.Introduced the
cell_speciesfield which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_classfield as typestring.Added a controlled vocabulary the
library_generation_methodfield.Changed the controlled vocabulary terms of
complete_sequences. Replacingcomplete & untemplatedwithcomplete+untemplatedand addingmixed.Added the
pcr_targetfield referencing the newPCRTargetschema object.
SequencingRun Schema:
Added the
sequencing_run_idfield which serves as the object identifer field.Added the
sequencing_filesfield which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_typefield defining the sequence file type. This field is a controlled vocabulary restricted to:fasta,fastq.Added the
paired_read_lengthfield defining mate-pair read lengths.Defined the
read_directionandpaired_read_directionfields as typestring.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id,primary_annotation,data_processing_files,germline_databaseandanalysis_provenance_idfields.
Version 1.2.1: Oct 5, 2018¶
Minor patch release.
Schema gene vs segment terminology corrections
Added
InfoobjectUpdated
cell_subsetURL in AIRR schema
Version 1.2.0: Aug 18, 2018¶
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_idfieldsIntroduced
keywords_studyfield
Version 1.1.0: May 3, 2018¶
Initial public released of the Rearrangement and Alignment schemas.
Added
requiredandnullableconstrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airrobject indicating if field is required by MiAIRR.Rename
rearrangement_set_idtodata_processing_id.Rename
study_descriptiontostudy_type.Added
physical_quantityformat.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotationandrepertoire_id.Added
diagnosisto repertoire object.Added ontology for
organism.Added more detailed specification of
sequencing_run,repertoireandrearrangement.Added repertoire schema.
Rename
definitions.yamltoairr-schema.yaml.Removed
c_call,c_scoreandc_cigarfrom required as this is not typical reference aligner output.Renamed
vdj_score,vdj_identity,vdj_evalue, andvdj_cigartoscore,identity,evalue, andcigar.Added missing
c_identityandc_evaluefields toRearrangementspec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangementspec.Remove repeated objects in
definitions.yaml.Added
Alignmentobject todefinitions.yaml.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018¶
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.