Release Notes#
Schema Release Notes#
Version 1.4.0-dev: (In development)#
Version 1.4-dev, In development.
New Schema:
Introduced the
RepertoireGroup
Schema for describing sets of repertoires to be analyzed together. Has support for defining a time series usingTimePoint
.
Rearrangement Schema:
Added the optional fields
v_frameshift
,j_frameshift
,d_frame
andd2_frame
defining annotations related to alignment reading frames.
Version 1.3.1: October 13, 2020#
Version 1.3 documentation patch release.
Alignment Schema:
Added the deprecation tags for
rearrangement_id
, which were accidentally left out of the v1.3.0 release.
Version 1.3.0: May 28, 2020#
Version 1.3 schema release.
New Schema:
Introduced the
Repertoire
Schema for describing study meta data.Introduced the PCRTarget Schema for describing primer target locations.
Introduced the SampleProcessing Schema for describing experimental processing steps for a sample.
Replaced the SoftwareProcessing schema with the DataProcessing schema.
Introduced experimental schema for clonal clusters, lineage trees, tree nodes, and cells as Clone, Tree, Node, and Cell objects, respectively.
General Updates:
Added multiple additional attributes to a large number of schema propertes as AIRR extension attributes in the
x-airr
field. The newAttributes
object contains definitions for thesex-airr
field attributes.Added the top level
required
property to all relevant schema objects.Added the
title
attribute containing the short, descriptive name to all relevant schema object fields.Added an
example
attribute containing an example data value to multiple schema object fields.
AIRR Data Commons API:
Added OpenAPI V2 specification (
specs/adc-api.yaml
) for AIRR Data Commons API major version 1.
Ontology Support:
Added
Ontology
andCURIEResolution
objects to support ontologies.Added vocabularies/ontologies as JSON string for: Cell subset, Target substrate, Library generation method, Complete sequences, Physical linkage of different loci.
Rearrangement Schema:
Added the
complete_vdj
field to annotate whether a V(D)J alignment was full length.Added the
junction_length_aa
field defining the length of the junction amino acid sequence.Added the
repertoire_id
,sample_processing_id
, anddata_processing_id
fields to serve as linkers to the appropriate metadata objects.Added a controlled vocabulary to the
locus
field:IGH
,IGI
,IGK
,IGL
,TRA
,TRB
,TRD
,TRG
.Deprecated the
rearrangement_set_id
andgermline_database
fields.Deprecated
rearrangement_id
field and made thesequence_id
field be the primary unique identifer for a rearrangement record, both in files and data repositories.Added support secondary D gene rearrangement through the additional fields:
d2_call
,d2_score
,d2_identity
,d2_support
,d2_cigar
np3
,np3_aa
,np3_length
,n3_length
,p5d2_length
,p3d2_length
,d2_sequence_start
,d2_sequence_end
,d2_germline_start
,d2_germline_start
,d2_alignment_start
,d2_alignment_end
,d2_sequence_alignment
,d2_sequence_alignment_aa
,d2_germline_alignment
,d2_germline_alignment_aa
.Updated field definitions with more concise V(D)J call descriptions.
Alignment Schema:
Deprecated the
rearrangement_set_id
andgermline_database
fields.Added the
data_processing_id
field.
Study Schema:
Added the
study_type
field containing an ontology defined term for the study design.
Subject Schema:
Deprecated the
organism
field in favor of the newspecies
field.Deprecated the
age
field.Introduced age ranges:
age_min
,age_max
, andage_unit
.
Diagnosis Schema:
Changed the type of the
disease_diagnosis
field fromstring
toOntology
.
Sample Schema:
Changed the type of the
tissue
field fromstring
toOntology
.
CellProcessing Schema:
Changed the type of the
cell_subset
field fromstring
toOntology
.Introduced the
cell_species
field which denotes the species from which the analyzed cells originate.
NucleicAcidProcessing Schema:
Defined the
template_class
field as typestring
.Added a controlled vocabulary the
library_generation_method
field.Changed the controlled vocabulary terms of
complete_sequences
. Replacingcomplete & untemplated
withcomplete+untemplated
and addingmixed
.Added the
pcr_target
field referencing the newPCRTarget
schema object.
SequencingRun Schema:
Added the
sequencing_run_id
field which serves as the object identifer field.Added the
sequencing_files
field which links to the RawSequenceData schema objects defining the raw read data.
RawSequenceData Schema:
Added the
file_type
field defining the sequence file type. This field is a controlled vocabulary restricted to:fasta
,fastq
.Added the
paired_read_length
field defining mate-pair read lengths.Defined the
read_direction
andpaired_read_direction
fields as typestring
.
DataProcessing Schema:
Replaces the SoftwareProcessing object.
Added
data_processing_id
,primary_annotation
,data_processing_files
,germline_database
andanalysis_provenance_id
fields.
Version 1.2.1: Oct 5, 2018#
Minor patch release.
Schema gene vs segment terminology corrections
Added
Info
objectUpdated
cell_subset
URL in AIRR schema
Version 1.2.0: Aug 18, 2018#
Peer reviewed released of the Rearrangement schema.
Definition change for the coordinate fields of the Rearrangement and Alignment schema. Coordinates are now defined as 1-based closed intervals, instead of 0-based half-open intervals (as previously defined in v1.1 of the schema).
Removed foreign
study_id
fieldsIntroduced
keywords_study
field
Version 1.1.0: May 3, 2018#
Initial public released of the Rearrangement and Alignment schemas.
Added
required
andnullable
constrains to AIRR schema.Schema definitions for MiAIRR attributes and ontology.
Introduction of an
x-airr
object indicating if field is required by MiAIRR.Rename
rearrangement_set_id
todata_processing_id
.Rename
study_description
tostudy_type
.Added
physical_quantity
format.Raw sequencing files into separate schema object.
Rename Attributes object.
Added
primary_annotation
andrepertoire_id
.Added
diagnosis
to repertoire object.Added ontology for
organism
.Added more detailed specification of
sequencing_run
,repertoire
andrearrangement
.Added repertoire schema.
Rename
definitions.yaml
toairr-schema.yaml
.Removed
c_call
,c_score
andc_cigar
from required as this is not typical reference aligner output.Renamed
vdj_score
,vdj_identity
,vdj_evalue
, andvdj_cigar
toscore
,identity
,evalue
, andcigar
.Added missing
c_identity
andc_evalue
fields toRearrangement
spec.Swapped order of N and S operators in CIGAR string.
Some description clean up for consistency in
Rearrangement
spec.Remove repeated objects in
definitions.yaml
.Added
Alignment
object todefinitions.yaml
.Updated MiARR format consistency check TSV with junction change.
Changed definition from functional to productive.
Version 1.0.1: Jan 9, 2018#
MiAIRR v1 official release and initial draft of Rearrangement and Alignment schemas.
Python Library Release Notes#
Version 1.4.0: In development#
Updated pandas requirement to 0.24.0 or higher.
Added support for missing integer values (
NaN
) inload_rearrangement
by casting to the pandasInt64
data type.
Version 1.3.1: October 13, 2020#
Refactored
merge_rearrangement
to allow for larger number of files.Improved error handling in format validation operations.
Version 1.3.0: May 30, 2020#
Updated schema set to v1.3.
Added
load_repertoire
,write_repertoire
, andvalidate_repertoire
toairr.interface
to read, write and validate Repertoire metadata, respectively.Added
repertoire_template
toairr.interface
which will return a complete repertoire object where all fields havenull
values.Added
validate_object
toairr.schema
that will validate a single repertoire object against the schema.Extended the
airr-tools
commandline program to validate both rearrangement and repertoire files.
Version 1.2.1: October 5, 2018#
Fixed a bug in the python reference library causing start coordinate values to be empty in some cases when writing data.
Version 1.2.0: August 17, 2018#
Updated schema set to v1.2.
Several improvements to the
validate_rearrangement
function.Changed behavior of all airr.interface functions to accept a file path (string) to a single Rearrangement TSV, instead of requiring a file handle as input.
Added
base
argument toRearrangementReader
andRearrangementWriter
to support optional conversion of 1-based closed intervals in the TSV to python-style 0-based half-open intervals. Defaults to conversion.Added the custom exception
ValidationError
for handling validation checks.Added the
validate
argument toRearrangementReader
which will raise aValidationError
exception when reading files with missing required fields or invalid values for known field types.Added
validate
argument to all type conversion methods inSchema
, which will now raise aValidationError
exception for value that cannot be converted when set toTrue
. When setFalse
(default), the previous behavior of assigningNone
as the converted value is retained.Added
validate_header
andvalidate_row
methods toSchema
and removed validations methods fromRearrangementReader
.Removed automatic closure of file handle upon reaching the iterator end in
RearrangementReader
.
Version 1.1.0: May 1, 2018#
Initial release.
R Library Release Notes#
Version 1.3.0: May 26, 2020#
Updated schema set to v1.3.
Added
info
slot toSchema
object containing general schema information.
Version 1.2.0: August 17, 2018#
Updated schema set to v1.2.
Changed defaults to
base="1"
for read and write functions.Updated example TSV file with coordinate changes, addition of
germline_alignment
data and simplification ofsequence_id
values.
Version 1.1.0: May 1, 2018#
Initial release.