Rearrangement Schema#
A Rearrangement is a sequence which describes a rearranged adaptive immune receptor chain (e.g., antibody heavy chain or TCR beta chain) along with a host of annotations. These annotations are defined by the AIRR Rearrangement schema and comprises eight categories.
Category |
Description |
---|---|
Input |
The input sequence to the V(D)J assignment process. |
Identifiers |
Primary and foreign key identifiers for linking AIRR data across files and databases. |
Primary Annotations |
The primary outputs of the V(D)J assignment process, which includes the gene locus, V, D, J, and C gene calls, various flags, V(D)J junction sequence, copy number ( |
Alignment Annotations |
Detailed alignment annotations including the input and germline sequences used in the alignment; score, identity, statistical support (E-value, likelihood, etc); and the alignment itself through CIGAR strings for each aligned gene. |
Alignment Positions |
The start/end positions for genes in both the input and germline sequences. |
Region Sequence |
Sequence annotations for the framework regions (FWRs) and complementarity-determining regions (CDRs). |
Region Positions |
Positional annotations for the framework regions (FWRs) and complementarity-determining regions (CDRs). |
Junction Lengths |
Lengths for junction sub-regions associated with aspects of the V(D)J recombination process. |
File Format Specification#
Data for Rearrangement
or Alignment
objects are stored as rows in a
tab-delimited file and should be compatible with any TSV reader.
Encoding#
The file should be encoded as ASCII or UTF-8.
Everything is case-sensitive.
Dialect#
The record separator is a newline
\n
and the field separator is a tab\t
.Fields or data should not be quoted.
A header line with the AIRR-specified column names is always required.
Values must not contain tab or newline characters.
Values should avoid
@
,#
, and quote ("
or'
) characters, as the result may be implementation dependent.Nested delimiters are not supported by the schema explicitly and should be avoided. However, if multiple values must be reported in a single column for an application specific reason, then the use of a comma as the delimiter is recommended.
File names#
AIRR formatted TSV files should end with .tsv
.
File Structure#
The data file has two sections in this order:
Header. A single line with column names.
Data values. One record per line.
A comment section preceding the header (e.g., #
or @
blocks) is not part of the
specification, but such a section is reserved for potential inclusion in a future
release. As such, a comment section should not be included in the file as it may
be incompatible with a future specification.
Header#
A single line containing the column names and specifying the field order. Any field that corresponds to one of the defined fields should use the specified field name.
Required columns#
Some of the fields are defined as required
and therefore must always be present
in the header. Note, however, that all columns allow for null values. Therefore,
required columns exist to define a core set of fields that are always present in
the table structure, but do not mandate that a value be reported.
Custom columns#
There are no restrictions on inclusion of additional custom columns in the
Rearrangements file, provided such columns do not use the same name as an
existing required or optional field. It is recommended that custom fields
follow the same naming scheme as existing fields. Meaning, snake_case
with narrowing scope when read from left to right. For example,
sequence_id
is the “identifier of the query sequence”.
Consider submitting a pull request for a field name reservation to the airr-standards repository if the field may be broadly useful.
Ordering#
There are no requirements that fields or records be sorted or ordered in any specific way. However, the field ordering provided by the schema is a recommended default, with top-to-bottom equating to left-to-right.
Data Values#
The possible data types are string
, boolean
, number
(floating point),
integer
, and null
(empty string).
Boolean values#
Boolean values must be encoded as T
for true and F
for false.
Null values#
All fields may contain null values. This includes columns that are described as
required
. A null value should be encoded as an empty string.
Coordinate numbering#
All alignment sequence coordinates use the same scheme as IMGT and INSDC
(DDBJ, ENA, GenBank), with the exception that partial coordinate information
should not be used in favor of simply assigning the start/end of the alignment.
Meaning, coordinates should be provided as 1-based values with closed intervals,
without the use of >
or <
annotations that denoted a partial region.
CIGAR specification#
Alignments details are specified using the CIGAR format as defined in the SAM specifications, with some vocabulary restrictions on the use of clipping, skipping, and padding operators.
The CIGAR string defines the reference sequence as the germline sequence of the
given gene or region; e.g., for v_cigar
the reference
is the V gene germline sequence. The query sequence is what was input into the
alignment tool, which must correspond to what is contained in the sequence
field of the Rearrangement data. For the majority of use cases, this will
necessarily exclude alignment spacers from the CIGAR string, such as IMGT
numbering gaps. However, any gaps appearing in the query sequence
should be accounted for in the CIGAR string so that the alignment between
the query and reference is correctly represented.
The valid operator sets and definitions are as follows:
Operator |
Description |
---|---|
= |
An identical non-gap character. |
X |
A differing non-gap character. |
M |
A positional match in the alignment. This can be either an identical (=) or differing (x) non-gap character. |
D |
Deletion in the query (gap in the query). |
I |
Insertion in the query (gap in the reference). |
S |
Positions that appear in the query, but not the reference. Used exclusively to denote the start position of the alignment in the query. Should precede any N operators. |
N |
A space in the alignment. Used exclusively to denote the start position of the alignment in the reference. Should follow any S operators. |
Note, the use of either the =
/X
or M
syntax is valid, but should be used consistently.
While leading S
and N
operators are required, tailing S
and N
operators are optional.
For example, an D gene alignment that starts at position 419 in the query sequence
(leading 418S
), that is 16 nucleotides long with no indels (middle 16M
),
has an 10 nucleotide 5’ deletion (leading 10N
), a 5 nucleotide 3’ deletion (trailing 5N
),
and ends 72 nucleotides from the end of the query sequence
(trailing 71S
) would
have the following D gene CIGAR string (d_cigar
) and positional information:
Field |
Value |
---|---|
d_cigar |
418S10N16M71S5N |
d_sequence_start |
419 |
d_sequence_end |
434 |
d_germline_start |
11 |
d_germline_end |
26 |
Definition Clarifications#
Junction versus CDR3#
We work with the IMGT definitions of the junction and CDR3 regions. Specifically,
the IMGT JUNCTION
includes the conserved cysteine and tryptophan/phenylalanine
residues, while CDR3
excludes those two residues. Therefore, our junction
and junction_aa
fields which represent the extracted sequence include the two
conserved residues, while the coordinate fields (cdr3_start
and cdr3_end
)
exclude them.
Productive#
The schema does not define a strict definition of a productive rearrangement. However, the IMGT definition is recommended:
Coding region has an open reading frame
No defect in the start codon, splicing sites or regulatory elements.
No internal stop codons.
An in-frame junction region.
Locus names#
A naming convention for locus names is not strictly enforced, but the IMGT locus names are recommended. For example, in the case of human data, this would be the set: IGH, IGK, IGL, TRA, TRB, TRD, or TRG.
Gene and allele names#
Gene call examples use the IMGT nomenclature, but no specific gene or allele nomenclature is strictly mandated. Species denotations may or may not be included in the gene name, as appropriate. For example, “Homo sapiens IGHV4-59*01”, “IGHV4-59*01” and “AB019438” are all valid entries for the same allele.
However, when using an established reference database to assign gene calls adherence to the exact nomenclature used by the reference database is strongly recommended, as this will facilitate mapping to the database entries, cross-study comparison, and upload to public repositories.
Alignments#
There is no required alignment scheme for the nucleotide and amino acid alignment
fields. These fields may, or may not, include numbering spacers (e.g., IMGT-numbering gaps),
variations in case to denote mismatches, deletions, or other features appropriate to the tool that
performed the alignment. The only strict requirement is that the query (sequence
) and
reference (germline
) must be properly aligned.
Frameshifts#
For purposes of annotating alignments, a frameshift is defined as a frameshift that is maintained until the end of the aligned gene, where frames are designated numerically as 1 (in-frame), 2, or 3. For example, an V gene alignment that starts in frame 1 and ends in frame 2, disrupting the conserved cystine, would be defined as a frameshift. Whereas, a V gene alignment with an internal frameshift that corrects with a second frameshift, back to the original frame 1 prior to the conserved cystine, would not need to be annotated as a frameshift.
Fields#
The specification includes two classes of fields. Those that are required and those that are optional. Required is defined as a column that must be present in the header of the TSV. Optional is defined as column that may, or may not, appear in the TSV. All fields, including required fields, are nullable by assigning an empty string as the value. There are no requirements for column ordering in the schema, although the Python and R reference APIs enforce ordering for the sake of generating predictable output. The set of optional fields that provide alignment and region coordinates (“_start” and “_end” fields) are defined as 1- based closed intervals, similar to the SAM, VCF, GFF, IMGT, and INDSC formats (GenBank, ENA, and DDJB; http://www.insdc.org).
Most fields have strict definitions for the values that they contain. However, some commonly provided information cannot be standardized across diverse toolchains, so a small selection of fields have context-dependent definitions. In particular, these context-dependent fields include the optional “_score,” “_identity,” and “_support” fields used for assessing the quality of alignments which vary considerably in definition based on the methodology used. Similarly, the “_alignment” fields require strict alignment between the corresponding observed and germline sequences, but the manner in which that alignment is conveyed is somewhat flexible in that it allows for any numbering scheme (e.g., IMGT or KABAT) or lack thereof.
By default, data elements representing sequences in the schema contain nucleotide sequences except for data elements ending in “_aa,” which are amino acid translations of the associated nucleotide sequence.
While the format contains an extensive list of reserved field names, there are no restrictions on inclusion of custom fields in the TSV file, provided such custom fields have a unique name. Furthermore, suggestions for extending the format with additional reserved names are welcomed through the issue tracker on the GitHub repository (airr-community/airr-standards).
Name |
Type |
Attributes |
Definition |
---|---|---|---|
|
string |
required, identifier, nullable |
Unique query sequence identifier for the Rearrangement. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model. |
|
string |
required, nullable |
The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment. |
|
string |
optional, nullable |
The Sanger/Phred quality scores for assessment of sequence quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.) |
|
string |
optional, nullable |
Amino acid translation of the query nucleotide sequence. |
|
boolean |
required, nullable |
True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of ‘sequence’. |
|
boolean |
required, nullable |
True if the V(D)J sequence is predicted to be productive. |
|
boolean |
optional, nullable |
True if the V and J gene alignments are in-frame. |
|
boolean |
optional, nullable |
True if the aligned sequence contains a stop codon. |
|
boolean |
optional, nullable |
True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment. |
|
string |
optional, nullable |
Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature. |
|
optional, nullable |
Binomial designation of the species from which the locus originates. Typically, this value should be identical to organism, if which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g. transgenic animal models. If set, this key will overwrite the organism information for all lower layers of the schema. |
|
|
string |
required, nullable |
V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB). |
|
string |
required, nullable |
First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). |
|
string |
optional, nullable |
Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). |
|
string |
required, nullable |
J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB). |
|
string |
optional, nullable |
Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB). |
|
string |
required, nullable |
Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement. |
|
string |
optional, nullable |
Sanger/Phred quality scores for assessment of sequence_alignment quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.) |
|
string |
optional, nullable |
Amino acid translation of the aligned query sequence. |
|
string |
required, nullable |
Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the assembled germline sequence. |
|
string |
required, nullable |
Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons. |
|
string |
required, nullable |
Amino acid translation of the junction. |
|
string |
optional, nullable |
Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments. |
|
string |
optional, nullable |
Amino acid translation of the np1 field. |
|
string |
optional, nullable |
Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments. |
|
string |
optional, nullable |
Amino acid translation of the np2 field. |
|
string |
optional, nullable |
Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments. |
|
string |
optional, nullable |
Amino acid translation of the np3 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned CDR1 region. |
|
string |
optional, nullable |
Amino acid translation of the cdr1 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned CDR2 region. |
|
string |
optional, nullable |
Amino acid translation of the cdr2 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned CDR3 region. |
|
string |
optional, nullable |
Amino acid translation of the cdr3 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned FWR1 region. |
|
string |
optional, nullable |
Amino acid translation of the fwr1 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned FWR2 region. |
|
string |
optional, nullable |
Amino acid translation of the fwr2 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned FWR3 region. |
|
string |
optional, nullable |
Amino acid translation of the fwr3 field. |
|
string |
optional, nullable |
Nucleotide sequence of the aligned FWR4 region. |
|
string |
optional, nullable |
Amino acid translation of the fwr4 field. |
|
number |
optional, nullable |
Alignment score for the V gene. |
|
number |
optional, nullable |
Fractional identity for the V gene alignment. |
|
number |
optional, nullable |
V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool. |
|
string |
required, nullable |
CIGAR string for the V gene alignment. |
|
number |
optional, nullable |
Alignment score for the first or only D gene alignment. |
|
number |
optional, nullable |
Fractional identity for the first or only D gene alignment. |
|
number |
optional, nullable |
D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool. |
|
string |
required, nullable |
CIGAR string for the first or only D gene alignment. |
|
number |
optional, nullable |
Alignment score for the second D gene alignment. |
|
number |
optional, nullable |
Fractional identity for the second D gene alignment. |
|
number |
optional, nullable |
D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool. |
|
string |
optional, nullable |
CIGAR string for the second D gene alignment. |
|
number |
optional, nullable |
Alignment score for the J gene alignment. |
|
number |
optional, nullable |
Fractional identity for the J gene alignment. |
|
number |
optional, nullable |
J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool. |
|
string |
required, nullable |
CIGAR string for the J gene alignment. |
|
number |
optional, nullable |
Alignment score for the C gene alignment. |
|
number |
optional, nullable |
Fractional identity for the C gene alignment. |
|
number |
optional, nullable |
C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool. |
|
string |
optional, nullable |
CIGAR string for the C gene alignment. |
|
integer |
optional, nullable |
Start position of the V gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
End position of the V gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment start position in the V gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment end position in the V gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the first or only D gene in the query sequence. (1-based closed interval). |
|
integer |
optional, nullable |
End position of the first or only D gene in the query sequence. (1-based closed interval). |
|
integer |
optional, nullable |
Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval). |
|
integer |
optional, nullable |
Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the second D gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
End position of the second D gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment start position in the second D gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment end position in the second D gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the J gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
End position of the J gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment start position in the J gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment end position in the J gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the C gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
End position of the C gene in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment start position in the C gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Alignment end position in the C gene reference sequence (1-based closed interval). |
|
integer |
optional, nullable |
Start position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
End position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). |
|
integer |
optional, nullable |
CDR1 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
CDR1 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
CDR2 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
CDR2 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
CDR3 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
CDR3 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR1 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR1 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR2 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR2 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR3 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR3 end position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR4 start position in the query sequence (1-based closed interval). |
|
integer |
optional, nullable |
FWR4 end position in the query sequence (1-based closed interval). |
|
string |
optional, nullable |
Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers. |
|
string |
optional, nullable |
Amino acid translation of the v_sequence_alignment field. |
|
string |
optional, nullable |
Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers. |
|
string |
optional, nullable |
Amino acid translation of the d_sequence_alignment field. |
|
string |
optional, nullable |
Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers. |
|
string |
optional, nullable |
Amino acid translation of the d2_sequence_alignment field. |
|
string |
optional, nullable |
Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers. |
|
string |
optional, nullable |
Amino acid translation of the j_sequence_alignment field. |
|
string |
optional, nullable |
Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers. |
|
string |
optional, nullable |
Amino acid translation of the c_sequence_alignment field. |
|
string |
optional, nullable |
Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the v_germline_alignment field. |
|
string |
optional, nullable |
Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the d_germline_alignment field. |
|
string |
optional, nullable |
Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the d2_germline_alignment field. |
|
string |
optional, nullable |
Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the j_germline_alignment field. |
|
string |
optional, nullable |
Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any). |
|
string |
optional, nullable |
Amino acid translation of the c_germline_aligment field. |
|
integer |
optional, nullable |
Number of nucleotides in the junction sequence. |
|
integer |
optional, nullable |
Number of amino acids in the junction sequence. |
|
integer |
optional, nullable |
Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments. |
|
integer |
optional, nullable |
Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments. |
|
integer |
optional, nullable |
Number of nucleotides between the second D gene and J gene alignments. |
|
integer |
optional, nullable |
Number of untemplated nucleotides 5’ of the first or only D gene alignment. |
|
integer |
optional, nullable |
Number of untemplated nucleotides 3’ of the first or only D gene alignment. |
|
integer |
optional, nullable |
Number of untemplated nucleotides 3’ of the second D gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 3’ of the V gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 5’ of the first or only D gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 3’ of the first or only D gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 5’ of the second D gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 3’ of the second D gene alignment. |
|
integer |
optional, nullable |
Number of palindromic nucleotides 5’ of the J gene alignment. |
|
boolean |
optional, nullable |
True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence. |
|
boolean |
optional, nullable |
True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence. |
|
integer |
optional, nullable |
Numerical reading frame (1, 2, 3) of the first or only D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence. |
|
integer |
optional, nullable |
Numerical reading frame (1, 2, 3) of the second D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence. |
|
integer |
optional, nullable |
Number of reads contributing to the UMI consensus or contig assembly for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence. |
|
integer |
optional, nullable |
Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence. |
|
integer |
optional, nullable |
Number of distinct UMIs represented by this sequence. For example, the total number of UMIs that contribute to the contig assembly for the query sequence. |
|
string |
optional, identifier, nullable |
Identifier defining the cell of origin for the query sequence. |
|
string |
optional, identifier, nullable |
Clonal cluster assignment for the query sequence. |
|
string |
optional, identifier, nullable |
Identifier to the associated repertoire in study metadata. |
|
string |
optional, identifier, nullable |
Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing. |
|
string |
optional, identifier, nullable |
Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed. |
|
string |
DEPRECATED |
Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications. |
|
string |
DEPRECATED |
Identifier for grouping Rearrangement objects. |
|
string |
DEPRECATED |
Source of germline V(D)J genes with version number or date accessed. |