Rearrangement Schema#

A Rearrangement is a sequence which describes a rearranged adaptive immune receptor chain (e.g., antibody heavy chain or TCR beta chain) along with a host of annotations. These annotations are defined by the AIRR Rearrangement schema and comprises eight categories.

Category	Description
Input	The input sequence to the V(D)J assignment process.
Identifiers	Primary and foreign key identifiers for linking AIRR data across files and databases.
Primary Annotations	The primary outputs of the V(D)J assignment process, which includes the gene locus, V, D, J, and C gene calls, various flags, V(D)J junction sequence, copy number (`duplicate_count`), and the number of reads contributing to a consensus input sequence (`consensus_count`).
Alignment Annotations	Detailed alignment annotations including the input and germline sequences used in the alignment; score, identity, statistical support (E-value, likelihood, etc); and the alignment itself through CIGAR strings for each aligned gene.
Alignment Positions	The start/end positions for genes in both the input and germline sequences.
Region Sequence	Sequence annotations for the framework regions (FWRs) and complementarity-determining regions (CDRs).
Region Positions	Positional annotations for the framework regions (FWRs) and complementarity-determining regions (CDRs).
Junction Lengths	Lengths for junction sub-regions associated with aspects of the V(D)J recombination process.

File Format Specification#

Data for Rearrangement or Alignment objects are stored as rows in a tab-delimited file and should be compatible with any TSV reader.

Encoding#

The file should be encoded as ASCII or UTF-8.
Everything is case-sensitive.

Dialect#

The record separator is a newline \n and the field separator is a tab \t.
Fields or data should not be quoted.
A header line with the AIRR-specified column names is always required.
Values must not contain tab or newline characters.
Values should avoid @, #, and quote (" or ') characters, as the result may be implementation dependent.
Nested delimiters are not supported by the schema explicitly and should be avoided. However, if multiple values must be reported in a single column for an application specific reason, then the use of a comma as the delimiter is recommended.

File names#

AIRR formatted TSV files should end with .tsv.

File Structure#

The data file has two sections in this order:

Header. A single line with column names.
Data values. One record per line.

A comment section preceding the header (e.g., # or @ blocks) is not part of the specification, but such a section is reserved for potential inclusion in a future release. As such, a comment section should not be included in the file as it may be incompatible with a future specification.

Required columns#

Some of the fields are defined as required and therefore must always be present in the header. Note, however, that all columns allow for null values. Therefore, required columns exist to define a core set of fields that are always present in the table structure, but do not mandate that a value be reported.

Custom columns#

There are no restrictions on inclusion of additional custom columns in the Rearrangements file, provided such columns do not use the same name as an existing required or optional field. It is recommended that custom fields follow the same naming scheme as existing fields. Meaning, snake_case with narrowing scope when read from left to right. For example, sequence_id is the “identifier of the query sequence”.

Consider submitting a pull request for a field name reservation to the airr-standards repository if the field may be broadly useful.

Ordering#

There are no requirements that fields or records be sorted or ordered in any specific way. However, the field ordering provided by the schema is a recommended default, with top-to-bottom equating to left-to-right.

Data Values#

The possible data types are string, boolean, number (floating point), integer, and null (empty string).

Boolean values#

Boolean values must be encoded as T for true and F for false.

Null values#

All fields may contain null values. This includes columns that are described as required. A null value should be encoded as an empty string.

Coordinate numbering#

All alignment sequence coordinates use the same scheme as IMGT and INSDC (DDBJ, ENA, GenBank), with the exception that partial coordinate information should not be used in favor of simply assigning the start/end of the alignment. Meaning, coordinates should be provided as 1-based values with closed intervals, without the use of > or < annotations that denoted a partial region.

CIGAR specification#

Alignments details are specified using the CIGAR format as defined in the SAM specifications, with some vocabulary restrictions on the use of clipping, skipping, and padding operators.

The CIGAR string defines the reference sequence as the germline sequence of the given gene or region; e.g., for v_cigar the reference is the V gene germline sequence. The query sequence is what was input into the alignment tool, which must correspond to what is contained in the sequence field of the Rearrangement data. For the majority of use cases, this will necessarily exclude alignment spacers from the CIGAR string, such as IMGT numbering gaps. However, any gaps appearing in the query sequence should be accounted for in the CIGAR string so that the alignment between the query and reference is correctly represented.

The valid operator sets and definitions are as follows:

Operator	Description
=	An identical non-gap character.
X	A differing non-gap character.
M	A positional match in the alignment. This can be either an identical (=) or differing (x) non-gap character.
D	Deletion in the query (gap in the query).
I	Insertion in the query (gap in the reference).
S	Positions that appear in the query, but not the reference. Used exclusively to denote the start position of the alignment in the query. Should precede any N operators.
N	A space in the alignment. Used exclusively to denote the start position of the alignment in the reference. Should follow any S operators.

Note, the use of either the =/X or M syntax is valid, but should be used consistently. While leading S and N operators are required, tailing S and N operators are optional.

For example, an D gene alignment that starts at position 419 in the query sequence (leading 418S), that is 16 nucleotides long with no indels (middle 16M), has an 10 nucleotide 5’ deletion (leading 10N), a 5 nucleotide 3’ deletion (trailing 5N), and ends 72 nucleotides from the end of the query sequence (trailing 71S) would have the following D gene CIGAR string (d_cigar) and positional information:

Field	Value
d_cigar	418S10N16M71S5N
d_sequence_start	419
d_sequence_end	434
d_germline_start	11
d_germline_end	26

Definition Clarifications#

Junction versus CDR3#

We work with the IMGT definitions of the junction and CDR3 regions. Specifically, the IMGT JUNCTION includes the conserved cysteine and tryptophan/phenylalanine residues, while CDR3 excludes those two residues. Therefore, our junction and junction_aa fields which represent the extracted sequence include the two conserved residues, while the coordinate fields (cdr3_start and cdr3_end) exclude them.

Productive#

The schema does not define a strict definition of a productive rearrangement. However, the IMGT definition is recommended:

Coding region has an open reading frame
No defect in the start codon, splicing sites or regulatory elements.
No internal stop codons.
An in-frame junction region.

Locus names#

A naming convention for locus names is not strictly enforced, but the IMGT locus names are recommended. For example, in the case of human data, this would be the set: IGH, IGK, IGL, TRA, TRB, TRD, or TRG.

Gene and allele names#

Gene call examples use the IMGT nomenclature, but no specific gene or allele nomenclature is strictly mandated. Species denotations may or may not be included in the gene name, as appropriate. For example, “Homo sapiens IGHV4-59*01”, “IGHV4-59*01” and “AB019438” are all valid entries for the same allele.

However, when using an established reference database to assign gene calls adherence to the exact nomenclature used by the reference database is strongly recommended, as this will facilitate mapping to the database entries, cross-study comparison, and upload to public repositories.

Alignments#

There is no required alignment scheme for the nucleotide and amino acid alignment fields. These fields may, or may not, include numbering spacers (e.g., IMGT-numbering gaps), variations in case to denote mismatches, deletions, or other features appropriate to the tool that performed the alignment. The only strict requirement is that the query (sequence) and reference (germline) must be properly aligned.

Frameshifts#

For purposes of annotating alignments, a frameshift is defined as a frameshift that is maintained until the end of the aligned gene, where frames are designated numerically as 1 (in-frame), 2, or 3. For example, an V gene alignment that starts in frame 1 and ends in frame 2, disrupting the conserved cystine, would be defined as a frameshift. Whereas, a V gene alignment with an internal frameshift that corrects with a second frameshift, back to the original frame 1 prior to the conserved cystine, would not need to be annotated as a frameshift.

Fields#

The specification includes two classes of fields. Those that are required and those that are optional. Required is defined as a column that must be present in the header of the TSV. Optional is defined as column that may, or may not, appear in the TSV. All fields, including required fields, are nullable by assigning an empty string as the value. There are no requirements for column ordering in the schema, although the Python and R reference APIs enforce ordering for the sake of generating predictable output. The set of optional fields that provide alignment and region coordinates (“_start” and “_end” fields) are defined as 1- based closed intervals, similar to the SAM, VCF, GFF, IMGT, and INDSC formats (GenBank, ENA, and DDJB; http://www.insdc.org).

Most fields have strict definitions for the values that they contain. However, some commonly provided information cannot be standardized across diverse toolchains, so a small selection of fields have context-dependent definitions. In particular, these context-dependent fields include the optional “_score,” “_identity,” and “_support” fields used for assessing the quality of alignments which vary considerably in definition based on the methodology used. Similarly, the “_alignment” fields require strict alignment between the corresponding observed and germline sequences, but the manner in which that alignment is conveyed is somewhat flexible in that it allows for any numbering scheme (e.g., IMGT or KABAT) or lack thereof.

By default, data elements representing sequences in the schema contain nucleotide sequences except for data elements ending in “_aa,” which are amino acid translations of the associated nucleotide sequence.

While the format contains an extensive list of reserved field names, there are no restrictions on inclusion of custom fields in the TSV file, provided such custom fields have a unique name. Furthermore, suggestions for extending the format with additional reserved names are welcomed through the issue tracker on the GitHub repository (airr-community/airr-standards).

Download as TSV

Name	Type	Attributes	Definition
`sequence_id`	string	required, identifier, nullable	Unique query sequence identifier for the Rearrangement. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model.
`sequence`	string	required, nullable	The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment.
`quality`	string	optional, nullable	The Sanger/Phred quality scores for assessment of sequence quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
`sequence_aa`	string	optional, nullable	Amino acid translation of the query nucleotide sequence.
`rev_comp`	boolean	required, nullable	True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of ‘sequence’.
`productive`	boolean	required, nullable	True if the V(D)J sequence is predicted to be productive.
`vj_in_frame`	boolean	optional, nullable	True if the V and J gene alignments are in-frame.
`stop_codon`	boolean	optional, nullable	True if the aligned sequence contains a stop codon.
`complete_vdj`	boolean	optional, nullable	True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
`locus`	string	optional, nullable	Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.
`locus_species`	Ontology	optional, nullable	Binomial designation of the species from which the locus originates. Typically, this value should be identical to organism, if which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g. transgenic animal models. If set, this key will overwrite the organism information for all lower layers of the schema.
`v_call`	string	required, nullable	V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB).
`d_call`	string	required, nullable	First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).
`d2_call`	string	optional, nullable	Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).
`j_call`	string	required, nullable	J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB).
`c_call`	string	optional, nullable	Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB).
`sequence_alignment`	string	required, nullable	Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement.
`quality_alignment`	string	optional, nullable	Sanger/Phred quality scores for assessment of sequence_alignment quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
`sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the aligned query sequence.
`germline_alignment`	string	required, nullable	Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any).
`germline_alignment_aa`	string	optional, nullable	Amino acid translation of the assembled germline sequence.
`junction`	string	required, nullable	Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.
`junction_aa`	string	required, nullable	Amino acid translation of the junction.
`np1`	string	optional, nullable	Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments.
`np1_aa`	string	optional, nullable	Amino acid translation of the np1 field.
`np2`	string	optional, nullable	Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
`np2_aa`	string	optional, nullable	Amino acid translation of the np2 field.
`np3`	string	optional, nullable	Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments.
`np3_aa`	string	optional, nullable	Amino acid translation of the np3 field.
`cdr1`	string	optional, nullable	Nucleotide sequence of the aligned CDR1 region.
`cdr1_aa`	string	optional, nullable	Amino acid translation of the cdr1 field.
`cdr2`	string	optional, nullable	Nucleotide sequence of the aligned CDR2 region.
`cdr2_aa`	string	optional, nullable	Amino acid translation of the cdr2 field.
`cdr3`	string	optional, nullable	Nucleotide sequence of the aligned CDR3 region.
`cdr3_aa`	string	optional, nullable	Amino acid translation of the cdr3 field.
`fwr1`	string	optional, nullable	Nucleotide sequence of the aligned FWR1 region.
`fwr1_aa`	string	optional, nullable	Amino acid translation of the fwr1 field.
`fwr2`	string	optional, nullable	Nucleotide sequence of the aligned FWR2 region.
`fwr2_aa`	string	optional, nullable	Amino acid translation of the fwr2 field.
`fwr3`	string	optional, nullable	Nucleotide sequence of the aligned FWR3 region.
`fwr3_aa`	string	optional, nullable	Amino acid translation of the fwr3 field.
`fwr4`	string	optional, nullable	Nucleotide sequence of the aligned FWR4 region.
`fwr4_aa`	string	optional, nullable	Amino acid translation of the fwr4 field.
`v_score`	number	optional, nullable	Alignment score for the V gene.
`v_identity`	number	optional, nullable	Fractional identity for the V gene alignment.
`v_support`	number	optional, nullable	V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool.
`v_cigar`	string	required, nullable	CIGAR string for the V gene alignment.
`d_score`	number	optional, nullable	Alignment score for the first or only D gene alignment.
`d_identity`	number	optional, nullable	Fractional identity for the first or only D gene alignment.
`d_support`	number	optional, nullable	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool.
`d_cigar`	string	required, nullable	CIGAR string for the first or only D gene alignment.
`d2_score`	number	optional, nullable	Alignment score for the second D gene alignment.
`d2_identity`	number	optional, nullable	Fractional identity for the second D gene alignment.
`d2_support`	number	optional, nullable	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool.
`d2_cigar`	string	optional, nullable	CIGAR string for the second D gene alignment.
`j_score`	number	optional, nullable	Alignment score for the J gene alignment.
`j_identity`	number	optional, nullable	Fractional identity for the J gene alignment.
`j_support`	number	optional, nullable	J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool.
`j_cigar`	string	required, nullable	CIGAR string for the J gene alignment.
`c_score`	number	optional, nullable	Alignment score for the C gene alignment.
`c_identity`	number	optional, nullable	Fractional identity for the C gene alignment.
`c_support`	number	optional, nullable	C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool.
`c_cigar`	string	optional, nullable	CIGAR string for the C gene alignment.
`v_sequence_start`	integer	optional, nullable	Start position of the V gene in the query sequence (1-based closed interval).
`v_sequence_end`	integer	optional, nullable	End position of the V gene in the query sequence (1-based closed interval).
`v_germline_start`	integer	optional, nullable	Alignment start position in the V gene reference sequence (1-based closed interval).
`v_germline_end`	integer	optional, nullable	Alignment end position in the V gene reference sequence (1-based closed interval).
`v_alignment_start`	integer	optional, nullable	Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`v_alignment_end`	integer	optional, nullable	End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`d_sequence_start`	integer	optional, nullable	Start position of the first or only D gene in the query sequence. (1-based closed interval).
`d_sequence_end`	integer	optional, nullable	End position of the first or only D gene in the query sequence. (1-based closed interval).
`d_germline_start`	integer	optional, nullable	Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval).
`d_germline_end`	integer	optional, nullable	Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval).
`d_alignment_start`	integer	optional, nullable	Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`d_alignment_end`	integer	optional, nullable	End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`d2_sequence_start`	integer	optional, nullable	Start position of the second D gene in the query sequence (1-based closed interval).
`d2_sequence_end`	integer	optional, nullable	End position of the second D gene in the query sequence (1-based closed interval).
`d2_germline_start`	integer	optional, nullable	Alignment start position in the second D gene reference sequence (1-based closed interval).
`d2_germline_end`	integer	optional, nullable	Alignment end position in the second D gene reference sequence (1-based closed interval).
`d2_alignment_start`	integer	optional, nullable	Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`d2_alignment_end`	integer	optional, nullable	End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`j_sequence_start`	integer	optional, nullable	Start position of the J gene in the query sequence (1-based closed interval).
`j_sequence_end`	integer	optional, nullable	End position of the J gene in the query sequence (1-based closed interval).
`j_germline_start`	integer	optional, nullable	Alignment start position in the J gene reference sequence (1-based closed interval).
`j_germline_end`	integer	optional, nullable	Alignment end position in the J gene reference sequence (1-based closed interval).
`j_alignment_start`	integer	optional, nullable	Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`j_alignment_end`	integer	optional, nullable	End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`c_sequence_start`	integer	optional, nullable	Start position of the C gene in the query sequence (1-based closed interval).
`c_sequence_end`	integer	optional, nullable	End position of the C gene in the query sequence (1-based closed interval).
`c_germline_start`	integer	optional, nullable	Alignment start position in the C gene reference sequence (1-based closed interval).
`c_germline_end`	integer	optional, nullable	Alignment end position in the C gene reference sequence (1-based closed interval).
`c_alignment_start`	integer	optional, nullable	Start position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`c_alignment_end`	integer	optional, nullable	End position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
`cdr1_start`	integer	optional, nullable	CDR1 start position in the query sequence (1-based closed interval).
`cdr1_end`	integer	optional, nullable	CDR1 end position in the query sequence (1-based closed interval).
`cdr2_start`	integer	optional, nullable	CDR2 start position in the query sequence (1-based closed interval).
`cdr2_end`	integer	optional, nullable	CDR2 end position in the query sequence (1-based closed interval).
`cdr3_start`	integer	optional, nullable	CDR3 start position in the query sequence (1-based closed interval).
`cdr3_end`	integer	optional, nullable	CDR3 end position in the query sequence (1-based closed interval).
`fwr1_start`	integer	optional, nullable	FWR1 start position in the query sequence (1-based closed interval).
`fwr1_end`	integer	optional, nullable	FWR1 end position in the query sequence (1-based closed interval).
`fwr2_start`	integer	optional, nullable	FWR2 start position in the query sequence (1-based closed interval).
`fwr2_end`	integer	optional, nullable	FWR2 end position in the query sequence (1-based closed interval).
`fwr3_start`	integer	optional, nullable	FWR3 start position in the query sequence (1-based closed interval).
`fwr3_end`	integer	optional, nullable	FWR3 end position in the query sequence (1-based closed interval).
`fwr4_start`	integer	optional, nullable	FWR4 start position in the query sequence (1-based closed interval).
`fwr4_end`	integer	optional, nullable	FWR4 end position in the query sequence (1-based closed interval).
`v_sequence_alignment`	string	optional, nullable	Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers.
`v_sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the v_sequence_alignment field.
`d_sequence_alignment`	string	optional, nullable	Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers.
`d_sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the d_sequence_alignment field.
`d2_sequence_alignment`	string	optional, nullable	Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers.
`d2_sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the d2_sequence_alignment field.
`j_sequence_alignment`	string	optional, nullable	Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers.
`j_sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the j_sequence_alignment field.
`c_sequence_alignment`	string	optional, nullable	Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers.
`c_sequence_alignment_aa`	string	optional, nullable	Amino acid translation of the c_sequence_alignment field.
`v_germline_alignment`	string	optional, nullable	Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any).
`v_germline_alignment_aa`	string	optional, nullable	Amino acid translation of the v_germline_alignment field.
`d_germline_alignment`	string	optional, nullable	Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any).
`d_germline_alignment_aa`	string	optional, nullable	Amino acid translation of the d_germline_alignment field.
`d2_germline_alignment`	string	optional, nullable	Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any).
`d2_germline_alignment_aa`	string	optional, nullable	Amino acid translation of the d2_germline_alignment field.
`j_germline_alignment`	string	optional, nullable	Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any).
`j_germline_alignment_aa`	string	optional, nullable	Amino acid translation of the j_germline_alignment field.
`c_germline_alignment`	string	optional, nullable	Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any).
`c_germline_alignment_aa`	string	optional, nullable	Amino acid translation of the c_germline_aligment field.
`junction_length`	integer	optional, nullable	Number of nucleotides in the junction sequence.
`junction_aa_length`	integer	optional, nullable	Number of amino acids in the junction sequence.
`np1_length`	integer	optional, nullable	Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments.
`np2_length`	integer	optional, nullable	Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
`np3_length`	integer	optional, nullable	Number of nucleotides between the second D gene and J gene alignments.
`n1_length`	integer	optional, nullable	Number of untemplated nucleotides 5’ of the first or only D gene alignment.
`n2_length`	integer	optional, nullable	Number of untemplated nucleotides 3’ of the first or only D gene alignment.
`n3_length`	integer	optional, nullable	Number of untemplated nucleotides 3’ of the second D gene alignment.
`p3v_length`	integer	optional, nullable	Number of palindromic nucleotides 3’ of the V gene alignment.
`p5d_length`	integer	optional, nullable	Number of palindromic nucleotides 5’ of the first or only D gene alignment.
`p3d_length`	integer	optional, nullable	Number of palindromic nucleotides 3’ of the first or only D gene alignment.
`p5d2_length`	integer	optional, nullable	Number of palindromic nucleotides 5’ of the second D gene alignment.
`p3d2_length`	integer	optional, nullable	Number of palindromic nucleotides 3’ of the second D gene alignment.
`p5j_length`	integer	optional, nullable	Number of palindromic nucleotides 5’ of the J gene alignment.
`v_frameshift`	boolean	optional, nullable	True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence.
`j_frameshift`	boolean	optional, nullable	True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence.
`d_frame`	integer	optional, nullable	Numerical reading frame (1, 2, 3) of the first or only D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
`d2_frame`	integer	optional, nullable	Numerical reading frame (1, 2, 3) of the second D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
`consensus_count`	integer	optional, nullable	Number of reads contributing to the UMI consensus or contig assembly for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence.
`duplicate_count`	integer	optional, nullable	Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence.
`umi_count`	integer	optional, nullable	Number of distinct UMIs represented by this sequence. For example, the total number of UMIs that contribute to the contig assembly for the query sequence.
`cell_id`	string	optional, identifier, nullable	Identifier defining the cell of origin for the query sequence.
`clone_id`	string	optional, identifier, nullable	Clonal cluster assignment for the query sequence.
`repertoire_id`	string	optional, identifier, nullable	Identifier to the associated repertoire in study metadata.
`sample_processing_id`	string	optional, identifier, nullable	Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing.
`data_processing_id`	string	optional, identifier, nullable	Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed.
`rearrangement_id`	string	DEPRECATED	Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications.
`rearrangement_set_id`	string	DEPRECATED	Identifier for grouping Rearrangement objects.
`germline_database`	string	DEPRECATED	Source of germline V(D)J genes with version number or date accessed.

Rearrangement Schema

Contents