Clone Schema#

A Clone object groups a set of Rearrangements or Cells that are inferred to be related by common descent from a single naive ancestor. The member Rearrangements and Cells are referenced from Clone as an array of Nodes and can include inferred ancestors that were not directly observed. All members of a Clone must be from either a single RepertoireGroup or from a single Repertoire if a RepertoireGroup was not created.

A Node links members of a Clone to their original metadata and annotations.

File Format Specification#

Files are YAML/JSON with an AIRR Data File structure. Files should be encoded as UTF-8. Identifiers are case-sensitive. Files should have the extension .yaml, .yml, or .json.

File Structure#

  • The DataFile is a dictionary (key/value pair) structure with the keys Clone and Node.

  • The file can (optionally) contain an Info object, at the beginning of the file, based upon the Info schema in the OpenAPI V2 specification. If provided, version in Info should reference the version of the AIRR schema for the file.

  • The file should correspond to a list of Clone objects, using Clone as the key to the list.

  • The file should correspond to a list of Node objects, using Node as the key to the list.

  • Each Clone object should contain a top-level key/value pair for clone_id that uniquely identifies the clone and a top-level key/value pair for either repertoire_group_id or repertoire_id that identifies the source of the clone’s members.

  • Each Node object should contain top-level key/value pairs for node_id (uniquely identifies the node) and repertoire_id (identifies the source repertoire).

  • Some fields require the use of a particular ontology or controlled vocabulary.

  • The structure is the same regardless of whether the data is stored in a file or a data repository.

Schema Field Definitions#

Clone Fields#

Download as TSV

Name

Type

Attributes

Definition

clone_id

string

required, identifier

Identifier for the clone.

repertoire_group_id

string

required, nullable

Identifier of the RepertoireGroup that this Clone is derived from. If the Clone is derived from a single Repertoire, repertoire_id may be used instead.

repertoire_id

string

required, nullable

Identifier of the Repertoire that this Clone is derived from. Should only be used if Clones are calculated from a single Repertoire without defining a RepertoireGroup. Otherwise use repertoire_group_id instead.

clone_class

string

optional, nullable

Is this a single-chain clone or a cell-based clone?

data_processing_id

string

optional, nullable

Identifier of the data processing object in the repertoire metadata for this clone.

nodes

array of Node

required, nullable

List of Nodes that are members of this clone.

inferred_ancestor

string

optional, nullable

Node_id string that acts as a key to the Node record for the inferred naive Rearrangement or Cell for this clone.

clone_count

integer

optional, nullable

Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, the number of distinct cell barcodes (unique cell_id values), or a more sophisticated calculation appropriate to the experimental protocol. Absolute count is provided versus a frequency so that downstream analysis tools can perform their own normalization.

seed_id

string

optional, nullable

sequence_id or cell_id of the seed sequence/cell. Empty string (or null) if there is no seed sequence.

tree

string

optional, nullable

Newick string describing the tree.

Node Fields#

Download as TSV

Name

Type

Attributes

Definition

node_id

string

required, identifier

Identifier for the node.

repertoire_id

string

required, nullable

Identifier of the repertoire that cell or rearrangement contained by this node.

cell_id

string

optional, nullable

Identifier of the cell contained by this node. NOTE: Mutually exclusive with Node.sequence_id

sequence_id

string

optional, nullable

Identifier of the rearrangement contained by this node. NOTE: Mutually exclusive with Node.cell_id

node_type

string

optional, nullable

Node type (source). “Observed” for rearrangements/cells that were observed in the source data, “inferred” for unobserved intermediates that were phylogenetically inferred, or “simulated” for simulated rearrangements/cells.

node_class

string

optional, nullable

Does this node contain a rearrangement of a cell?