Clone Schema#
A Clone object groups a set of Rearrangements or Cells that are
inferred to be related by common descent from a single naive ancestor. The
member Rearrangements and Cells are referenced from Clone as an
array of Nodes and can include inferred ancestors that were not directly
observed. All members of a Clone must be from either a single
RepertoireGroup or from a single Repertoire if a RepertoireGroup
was not created.
A Node links members of a Clone to their original metadata and
annotations.
File Format Specification#
Files are YAML/JSON with an AIRR Data File structure. Files should be
encoded as UTF-8. Identifiers are case-sensitive. Files should have the
extension .yaml, .yml, or .json.
File Structure#
The DataFile is a dictionary (key/value pair) structure with the keys
CloneandNode.The file can (optionally) contain an
Infoobject, at the beginning of the file, based upon theInfoschema in the OpenAPI V2 specification. If provided,versioninInfoshould reference the version of the AIRR schema for the file.The file should correspond to a list of
Cloneobjects, usingCloneas the key to the list.The file should correspond to a list of
Nodeobjects, usingNodeas the key to the list.Each
Cloneobject should contain a top-level key/value pair forclone_idthat uniquely identifies the clone and a top-level key/value pair for eitherrepertoire_group_idorrepertoire_idthat identifies the source of the clone’s members.Each
Nodeobject should contain top-level key/value pairs fornode_id(uniquely identifies the node) andrepertoire_id(identifies the source repertoire).Some fields require the use of a particular ontology or controlled vocabulary.
The structure is the same regardless of whether the data is stored in a file or a data repository.
Schema Field Definitions#
Clone Fields#
Name |
Type |
Attributes |
Definition |
|---|---|---|---|
|
string |
required, identifier |
Identifier for the clone. |
|
string |
required, nullable |
Identifier of the RepertoireGroup that this Clone is derived from. If the Clone is derived from a single Repertoire, repertoire_id may be used instead. |
|
string |
required, nullable |
Identifier of the Repertoire that this Clone is derived from. Should only be used if Clones are calculated from a single Repertoire without defining a RepertoireGroup. Otherwise use repertoire_group_id instead. |
|
string |
optional, nullable |
Is this a single-chain clone or a cell-based clone? |
|
string |
optional, nullable |
Identifier of the data processing object in the repertoire metadata for this clone. |
|
array of Node |
required, nullable |
List of Nodes that are members of this clone. |
|
string |
optional, nullable |
Node_id string that acts as a key to the Node record for the inferred naive Rearrangement or Cell for this clone. |
|
integer |
optional, nullable |
Absolute count of the size (number of members) of this clone in the repertoire. This could simply be the number of sequences (Rearrangement records) observed in this clone, the number of distinct cell barcodes (unique cell_id values), or a more sophisticated calculation appropriate to the experimental protocol. Absolute count is provided versus a frequency so that downstream analysis tools can perform their own normalization. |
|
string |
optional, nullable |
sequence_id or cell_id of the seed sequence/cell. Empty string (or null) if there is no seed sequence. |
|
string |
optional, nullable |
Newick string describing the tree. |
Node Fields#
Name |
Type |
Attributes |
Definition |
|---|---|---|---|
|
string |
required, identifier |
Identifier for the node. |
|
string |
required, nullable |
Identifier of the repertoire that cell or rearrangement contained by this node. |
|
string |
optional, nullable |
Identifier of the cell contained by this node. NOTE: Mutually exclusive with Node.sequence_id |
|
string |
optional, nullable |
Identifier of the rearrangement contained by this node. NOTE: Mutually exclusive with Node.cell_id |
|
string |
optional, nullable |
Node type (source). “Observed” for rearrangements/cells that were observed in the source data, “inferred” for unobserved intermediates that were phylogenetically inferred, or “simulated” for simulated rearrangements/cells. |
|
string |
optional, nullable |
Does this node contain a rearrangement of a cell? |