Rearrangement Schema

See the format overview for details on how to structure this data.

“Junction” versus “CDR3”

We work with the IMGT definitions of the junction/CDR3 regions. Specifically, the IMGT JUNCTION includes the conserved cysteine and tryptophan/phenylalanine residues, while CDR3 excludes those two residues. Therefore, our junction and junction_aa fields which represent the extracted sequence include the two conserved residues, while the coordinate fields (cdr3_start and cdr3_end) exclude them.

Fields

Download as TSV.

Name Type Priority Description
sequence_id string required Unique query sequence identifier within the file. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment.
sequence string required The query nucleotide sequence. Usually, this is the unmodified input sequence, reverse complemented, if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences, if these steps are preformed prior to alignment.
sequence_aa string   Amino acid translation of the query nucleotide sequence.
rev_comp boolean required The alignment is on the opposite strand (reverse complemented), with respect to the query sequence. All output data, such as alignment positions and sequences, are based on the reverse complement of ‘sequence’.
productive boolean required True if the V(D)J sequence is predicted to be productive. Productive is typically defined using the IMGT definition: (1) coding region has an open reading frame, (2) no defect in the start codon, splicing sites or regulatory elements, (3) no internal stop codons and (4) an in-frame junction region.
vj_in_frame boolean   True if the V and J segment alignments are in-frame.
stop_codon boolean   True if the aligned sequence contains a stop codon.
locus string   Gene locus (chain type). For human data one of IGH, IGK, IGL, TRA, TRB, TRD, or TRG.
v_call string required V gene with allele (e.g. IGHV4-59*01)
d_call string required D gene with allele (e.g. IGHD3-10*01)
j_call string required J gene with allele (e.g. IGHJ4*02)
c_call string   C region with allele
sequence_alignment string required Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement.
sequence_alignment_aa string   Amino acid translation of the aligned query sequence.
germline_alignment string required Assembled, aligned, fully length inferred germline sequence spaning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any).
germline_alignment_aa string   Amino acid translation of the assembled germline sequence.
junction string required Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.
junction_aa string required Junction region amino acid sequence.
np1 string   Nucleotide sequence of the combined N/P region between the V and D segments or V and J segments.
np1_aa string   Amino acid translation of the ‘np1’ field.
np2 string   Nucleotide sequence of the combined N/P region between the D and J segments.
np2_aa string   Amino acid translation of the ‘np2’ field.
cdr1 string   Nucleotide sequence of the aligned CDR1 region.
cdr1_aa string   Amino acid translation of the ‘cdr1’ field.
cdr2 string   Nucleotide sequence of the aligned CDR2 region.
cdr2_aa string   Amino acid translation of the ‘cdr2’ field.
cdr3 string   Nucleotide sequence of the aligned CDR3 region.
cdr3_aa string   Amino acid translation of the ‘cdr3’ field.
fwr1 string   Nucleotide sequence of the aligned FWR1 region.
fwr1_aa string   Amino acid translation of the ‘fwr1’ field.
fwr2 string   Nucleotide sequence of the aligned FWR2 region.
fwr2_aa string   Amino acid translation of the ‘fwr2’ field.
fwr3 string   Nucleotide sequence of the aligned FWR3 region.
fwr3_aa string   Amino acid translation of the ‘fwr3’ field.
fwr4 string   Nucleotide sequence of the aligned FWR4 region.
fwr4_aa string   Amino acid translation of the ‘fwr4’ field.
v_score number   V alignment score.
v_identity number   V alignment fractional identity.
v_support number   V alignment E-value, p-value, likelihood, probability or other similar measure of support for the V segment assignment as defined by the alignment tool.
v_cigar string required V alignment CIGAR string.
d_score number   D alignment score.
d_identity number   D alignment fractional identity.
d_support number   D alignment E-value, p-value, likelihood, probability or other similar measure of support for the D segment assignment as defined by the alignment tool.
d_cigar string required D alignment CIGAR string.
j_score number   J alignment score.
j_identity number   J alignment fractional identity.
j_support number   J alignment E-value, p-value, likelihood, probability or other similar measure of support for the J segment assignment as defined by the alignment tool.
j_cigar string required J alignment CIGAR string.
c_score number   C region alignment score.
c_identity number   C region alignment fractional identity.
c_support number   C alignment E-value, p-value, likelihood, probability or other similar measure of support for the C region assignment as defined by the alignment tool.
c_cigar string   C region alignment CIGAR string.
v_sequence_start integer   Start position of the V segment in the query sequence (0-based, half-open interval).
v_sequence_end integer   End position of the V segment in the query sequence (0-based, half-open interval).
v_germline_start integer   Alignment start position in the V reference sequence (0-based, half-open interval).
v_germline_end integer   Alignment end position in the V reference sequence (0-based, half-open interval).
v_alignment_start integer   Start position in the V segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
v_alignment_end integer   End position in the V segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
d_sequence_start integer   Start position of the D segment in the query sequence (0-based, half-open interval).
d_sequence_end integer   End position of the D segment in the query sequence (0-based, half-open interval).
d_germline_start integer   Alignment start position in the D reference sequence (0-based, half-open interval).
d_germline_end integer   Alignment end position in the D reference sequence (0-based, half-open interval).
d_alignment_start integer   Start position of the D segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
d_alignment_end integer   End position of the D segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
j_sequence_start integer   Start position of the J segment in the query sequence (0-based, half-open interval).
j_sequence_end integer   End position of the J segment in the query sequence (0-based, half-open interval).
j_germline_start integer   Alignment start position in the J reference sequence (0-based, half-open interval).
j_germline_end integer   Alignment end position in the J reference sequence (0-based, half-open interval).
j_alignment_start integer   Start position of the J segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
j_alignment_end integer   End position of the J segment in the ‘sequence_alignment’ and ‘germline_alignment’ fields (0-based, half-open interval).
fwr1_start integer   FWR1 start position in the query sequence (0-based, half-open interval).
fwr1_end integer   FWR1 end position in the query sequence (0-based, half-open interval).
cdr1_start integer   CDR1 start position in the query sequence (0-based, half-open interval).
cdr1_end integer   CDR1 end position in the query sequence (0-based, half-open interval).
fwr2_start integer   FWR2 start position in the query sequence (0-based, half-open interval).
fwr2_end integer   FWR2 end position in the query sequence (0-based, half-open interval).
cdr2_start integer   CDR2 start position in the query sequence (0-based, half-open interval).
cdr2_end integer   CDR2 end position in the query sequence (0-based, half-open interval).
fwr3_start integer   FWR3 start position in the query sequence (0-based, half-open interval).
fwr3_end integer   FWR3 end position in the query sequence (0-based, half-open interval).
cdr3_start integer   CDR3 start position in the query sequence (0-based, half-open interval).
cdr3_end integer   CDR3 end position in the query sequence (0-based, half-open interval).
fwr4_start integer   FWR3 start position in the query sequence (0-based, half-open interval).
fwr4_end integer   FWR4 end position in the query sequence (0-based, half-open interval).
v_sequence_alignment string   V segment aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps.
v_sequence_alignment_aa string   Amino acid translation of the V segment aligned portion of the query sequence.
d_sequence_alignment string   D segment aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps.
d_sequence_alignment_aa string   Amino acid translation of the D segment aligned portion of the query sequence.
j_sequence_alignment string   J segment aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps.
j_sequence_alignment_aa string   Amino acid translation of the J segment aligned portion of the query sequence.
c_sequence_alignment string   Constant region aligned portion of query sequence, including any indel corrections or numbering spacers.
c_sequence_alignment_aa string   Amino acid translation of the constant region aligned portion of the query sequence.
v_germline_alignment string   Aligned V segment germline sequence spaning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any).
v_germline_alignment_aa string   Amino acid translation of the align V segment germline sequence.
d_germline_alignment string   Aligned D segment germline sequence spaning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any).
d_germline_alignment_aa string   Amino acid translation of the align D segment germline sequence.
j_germline_alignment string   Aligned J segment germline sequence spaning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any).
j_germline_alignment_aa string   Amino acid translation of the align J segment germline sequence.
c_germline_alignment string   Aligned constant region germline sequence spaning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any).
c_germline_alignment_aa string   Amino acid translation of the align constant region germline sequence.
junction_length integer   Number of nucleotides in the ‘junction’ sequence.
np1_length integer   Number of nucleotides between the V and D segments or V and J segments.
np2_length integer   Number of nucleotides between the D and J segments.
n1_length integer   Number of untemplated nucleotides 5’ of the D segment.
n2_length integer   Number of untemplated nucleotides 3’ of the D segment.
p3v_length integer   Number of palindromic nucleotides 3’ of the V segment.
p5d_length integer   Number of palindromic nucleotides 5’ of the D segment.
p3d_length integer   Number of palindromic nucleotides 3’ of the D segment.
p5j_length integer   Number of palindromic nucleotides 5’ of the J segment.
consensus_count integer   Number of reads contributing to the (UMI) consensus for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence.
duplicate_count integer   Copy number or number of duplicate observations for the query sequence. For example, the number of UMIs sharing an identical sequence or the number of identical observations of this sequence absent UMIs.
cell_id string   Identifier defining the cell of origin for the query sequence.
clone_id string   Clonal cluster assignment for the query sequence.
rearrangement_id string   Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a univerally unique record locator for database applications.
rearrangement_set_id string   Identifier for grouping Rearrangement objects.
germline_database string   Source of germline V(D)J segments, with version number or date accessed. For example, ‘IMGT/GENE-DB 3.1.18 (15 March 2018)’.