Clone and Lineage Tree Schema (Experimental)#

A unique inferred clone object that has been constructed within a single data processing for a single repertoire and a subset of its sequences and/or rearrangements.

A clone object may have one or more inferred lineage trees. Each tree is represented by a Newick string for its edges and a dictionary of node objects.

File Format Specification#

The file format has not been specified yet.

Clone Fields#

Download as TSV

Name

Type

Attributes

Definition

clone_id

string

required, nullable

Identifier for the clone.

repertoire_id

string

optional, nullable

Identifier to the associated repertoire in study metadata.

data_processing_id

string

optional, nullable

Identifier of the data processing object in the repertoire metadata for this clone.

sequences

array of string

optional, nullable

List sequence_id strings that act as keys to the Rearrangement records for members of the clone.

v_call

string

optional, nullable

V gene with allele of the inferred ancestral of the clone. For example, IGHV4-59*01.

d_call

string

optional, nullable

D gene with allele of the inferred ancestor of the clone. For example, IGHD3-10*01.

j_call

string

optional, nullable

J gene with allele of the inferred ancestor of the clone. For example, IGHJ4*02.

junction

string

optional, nullable

Nucleotide sequence for the junction region of the inferred ancestor of the clone, where the junction is defined as the CDR3 plus the two flanking conserved codons.

junction_aa

string

optional, nullable

Amino acid translation of the junction.

junction_length

integer

optional, nullable

Number of nucleotides in the junction.

junction_aa_length

integer

optional, nullable

Number of amino acids in junction_aa.

germline_alignment

string

required, nullable

Assembled, aligned, full-length inferred ancestor of the clone spanning the same region as the sequence_alignment field of nodes (typically the V(D)J region) and including the same set of corrections and spacers (if any).

germline_alignment_aa

string

optional, nullable

Amino acid translation of germline_alignment.

v_alignment_start

integer

optional, nullable

Start position in the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

v_alignment_end

integer

optional, nullable

End position in the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

d_alignment_start

integer

optional, nullable

Start position of the D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

d_alignment_end

integer

optional, nullable

End position of the D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

j_alignment_start

integer

optional, nullable

Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

j_alignment_end

integer

optional, nullable

End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

junction_start

integer

optional, nullable

Junction region start position in the alignment (1-based closed interval).

junction_end

integer

optional, nullable

Junction region end position in the alignment (1-based closed interval).

umi_count

integer

optional, nullable

Number of distinct UMIs observed across all sequences (Rearrangement records) in this clone.

clone_count

integer

optional, nullable

Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, the number of distinct cell barcodes (unique cell_id values), or a more sophisticated calculation appropriate to the experimental protocol. Absolute count is provided versus a frequency so that downstream analysis tools can perform their own normalization.

seed_id

string

optional, nullable

sequence_id of the seed sequence. Empty string (or null) if there is no seed sequence.

Tree Fields#

Download as TSV

Name

Type

Attributes

Definition

tree_id

string

required, nullable

Identifier for the tree.

clone_id

string

required, nullable

Identifier for the clone.

newick

string

required, nullable

Newick string of the tree edges.

nodes

object

optional, nullable

Dictionary of nodes in the tree, keyed by sequence_id string

Node Fields#

Download as TSV

Name

Type

Attributes

Definition

sequence_id

string

required, nullable

Identifier for this node that matches the identifier in the newick string and, where possible, the sequence_id in the source repertoire.

sequence_alignment

string

optional, nullable

Nucleotide sequence of the node, aligned to the germline_alignment for this clone, including including any indel corrections or spacers.

junction

string

optional, nullable

Junction region nucleotide sequence for the node, where the junction is defined as the CDR3 plus the two flanking conserved codons.

junction_aa

string

optional, nullable

Amino acid translation of the junction.