Single-cell Schema#

The cell object acts as point of reference for all data that can be related to an individual cell, either by direct observation or inference.

File Format Specification#

Files are YAML/JSON with an AIRR Data File structure. Files should be encoded as UTF-8. Identifiers are case-sensitive. Files should have the extension .yaml, .yml, or .json.

Schema Field Definitions#

Cell Fields#

Download as TSV

Name

Type

Attributes

Definition

cell_id

string

required, identifier

Identifier for the Cell object. This identifier must be unique within a given study, but it is recommended that it be a universally unique record locator to enable database applications.

repertoire_id

string

required, nullable

Identifier to the associated repertoire in study metadata.

data_processing_id

string

optional, nullable

Identifier of the data processing object in the repertoire metadata for this cell.

receptors

array of string

optional, nullable

Array of receptor identifiers defined for the Receptor objects associated with this cell

cell_subset

Ontology

optional, nullable

Commonly-used designation of isolated cell population.

cell_phenotype

string

optional, nullable

List of cellular markers and their expression levels used to isolate the cell population.

cell_label

string

optional, nullable

Free text cell type annotation. Primarily used for annotating cell types that are not provided in the Cell Ontology.

virtual_pairing

boolean

required, nullable

boolean to indicate if pairing was inferred.

cell_type

string

optional, nullable

Cell type (source). Most often the type should be “observed,” meaning it corresponds to an actual physical sample that was sequenced. Other allowed values are “simulated,” i.e. the Cells were generated in silico and there is no linked Subject/Sample metadata, and “inferred” for Cells that are phylogenetically reconstructed from observed sequences. Inferred Cells should be assigned to a Repertoire with repertoire_type=inferred.

Expression Fields#

Download as TSV

Name

Type

Attributes

Definition

expression_id

string

required, identifier

Identifier for the Expression object. This identifier must be unique within a given study, but it is recommended that it be a universally unique record locator to enable database applications.

cell_id

string

required

Identifier of the cell to which this expression data is related.

repertoire_id

string

required, nullable

Identifier for the associated repertoire in study metadata.

data_processing_id

string

required, nullable

Identifier of the data processing object in the repertoire metadata for this cell.

property_type

string

required

Keyword describing the property type and detection method used to measure the property value. The following keywords are recommended, but custom property types are also valid: “mrna_expression_by_read_count”, “protein_expression_by_fluorescence_intensity”, “antigen_bait_binding_by_fluorescence_intensity”, “protein_expression_by_dna_barcode_count” and “antigen_bait_binding_by_dna_barcode_count”.

property

Ontology

required, nullable

Name of the property observed, typically a gene or antibody identifier (and label) from a canonical resource such as Ensembl (e.g. ENSG00000275747, IGHV3-79) or Antibody Registry (ABREG:1236456, Purified anti-mouse/rat/human CD27 antibody).

value

number

required, nullable

Level at which the property was observed in the experiment (non-normalized).