Single-cell Schema#
The cell object acts as point of reference for all data that can be related to an individual cell, either by direct observation or inference.
File Format Specification#
Files are YAML/JSON with an AIRR Data File structure. Files should be
encoded as UTF-8. Identifiers are case-sensitive. Files should have the
extension .yaml, .yml, or .json.
Schema Field Definitions#
Cell Fields#
Name |
Type |
Attributes |
Definition |
|---|---|---|---|
|
string |
required, identifier |
Identifier for the Cell object. This identifier must be unique within a given study, but it is recommended that it be a universally unique record locator to enable database applications. |
|
string |
required, nullable |
Identifier to the associated repertoire in study metadata. |
|
string |
optional, nullable |
Identifier of the data processing object in the repertoire metadata for this cell. |
|
array of string |
optional, nullable |
Array of receptor identifiers defined for the Receptor objects associated with this cell |
|
optional, nullable |
Commonly-used designation of isolated cell population. |
|
|
string |
optional, nullable |
List of cellular markers and their expression levels used to isolate the cell population. |
|
string |
optional, nullable |
Free text cell type annotation. Primarily used for annotating cell types that are not provided in the Cell Ontology. |
|
boolean |
required, nullable |
boolean to indicate if pairing was inferred. |
|
string |
optional, nullable |
Cell type (source). Most often the type should be “observed,” meaning it corresponds to an actual physical sample that was sequenced. Other allowed values are “simulated,” i.e. the Cells were generated in silico and there is no linked Subject/Sample metadata, and “inferred” for Cells that are phylogenetically reconstructed from observed sequences. Inferred Cells should be assigned to a Repertoire with repertoire_type=inferred. |
Expression Fields#
Name |
Type |
Attributes |
Definition |
|---|---|---|---|
|
string |
required, identifier |
Identifier for the Expression object. This identifier must be unique within a given study, but it is recommended that it be a universally unique record locator to enable database applications. |
|
string |
required |
Identifier of the cell to which this expression data is related. |
|
string |
required, nullable |
Identifier for the associated repertoire in study metadata. |
|
string |
required, nullable |
Identifier of the data processing object in the repertoire metadata for this cell. |
|
string |
required |
Keyword describing the property type and detection method used to measure the property value. The following keywords are recommended, but custom property types are also valid: “mrna_expression_by_read_count”, “protein_expression_by_fluorescence_intensity”, “antigen_bait_binding_by_fluorescence_intensity”, “protein_expression_by_dna_barcode_count” and “antigen_bait_binding_by_dna_barcode_count”. |
|
required, nullable |
Name of the property observed, typically a gene or antibody identifier (and label) from a canonical resource such as Ensembl (e.g. ENSG00000275747, IGHV3-79) or Antibody Registry (ABREG:1236456, Purified anti-mouse/rat/human CD27 antibody). |
|
|
number |
required, nullable |
Level at which the property was observed in the experiment (non-normalized). |