AIRR Ontologies and Vocabularies Team¶
Summary¶
The “Ontologies and Vocabularies Team” was formed as a joint interest group of the Common Repository (ComRepo) and the Minimal Standards (MiniStd) working groups of the AIRR Community. The long-term aim of the Team is to define standard vocabularies and ontologies to be used by AIRR-compliant databases.
Ontology Data Representation¶
The nodes in an ontology are typically either concepts (e.g., capital)
or instances thereof (e.g., Paris). These nodes have local IDs (often
numbers), which are unique within an ontology. They also typically have
labels, which is the human-readable name of the node. Ontology
entities in the AIRR Data Standard reflect this model, with each AIRR
field that is represented as an ontology recorded with a global
ontology ID (id
) and the corresponding label (label
).
Within the AIRR Standards, Compact URIs (CURIEs) are used to represent ontology IDs. CURIEs are a standardized way to abbreviate International Resource Identifiers (IRI, [RFC3987]), which includes URIs as a subset. They were originally conceived to simplify the handling of attributes, e.g. in XML or SPARQL, by making them more compact and readable. CURIEs are also used by IEDB databases to reduce redundancies (mainly in the leading part of IRIs).
For example, a typical CURIE would look like NCBITAXON:9258
. In this
case, NCBITAXON
is the prefix, a custom string that will be
replaced by a repository-defined IRI component (e.g.,
http://purl.obolibrary.org/obo/NCBITaxon_
). Note that there is no
connection between NCBITAXON
in the CURIE and NCBITaxon
in the
IRI, the former one is just a placeholder.
The AIRR schema will provide a list of AIRR approved CURIE prefixes along with a list of at least one IRI prefix (i.e., replacement string) for each them. This list serves two purposes:
It provides a controlled namespace for CURIE prefixes used in the AIRR schema. For now, custom additions to or replacements of these prefixes in the schema are prohibited. This does not affect the ability of repositories to use such custom prefixes internally.
It simplifies resolution of CURIEs by non-repositories. The lists of IRI prefixes for each CURIE prefix should not be considered to be exhaustive. However, when using custom IRI prefixes, it must be ensured that they refer to the same ontology as the provider prefixes.
It should be explicitly noted that the IRI prefix list should not be interpreted as any kind of recommendation for certain providers. It is left up to users to decide how to resolve the resulting IRIs, e.g., via DNS/HTTP (if possible) or by using a provider of their choice.
Approved Ontologies¶
Cell ontology (CL)
used in:
Cell subset (
cell_subset
, Tissue and Cell Processing)
CURIE summary
CURIE Prefix:
CL
CURIE IRI Prefix:
http://purl.obolibrary.org/obo/CL_
example AIRR use
“cell_subset.id” : “CL:0000542”
“cell_subset.label” : “lymphocyte”
default root node
label:
lymphocyte
local id:
CL_0000542
path: ``
license: CC BY
latest release (as of 2020-05-20): 2020-03-02
maintainer: Alexander Diehl, Buffalo, NY, US (addiehl@buffalo.edu)
Human disease ontology (DOID)
used in:
Diagnosis (
disease_diagnosis
, Diagnosis)
CURIE summary
CURIE Prefix:
DOID
CURIE IRI Prefix:
http://purl.obolibrary.org/obo/DOID_
example AIRR use
“disease_diagnosis.id” : “DOID:9538”
“disease_diagnosis.label” : “multiple myeloma”
default root node
label:
disease
local ID:
DOID:4
path:
disease
license: CC0
latest release (as of 2020-05-20): 2020-04-20
repo: https://github.com/DiseaseOntology/HumanDiseaseOntology
maintainer: Lynn Schriml, U Maryland, MD, US (lynn.schriml@gmail.com)
notes: Features ICD cross-reference
NCBI organismal taxonomy (NCBITAXON)
used in:
Species (
species
, Subject)Cell species (
cell_species
, Tissue and Cell Processing)
CURIE summary
CURIE Prefix:
NCBITAXON
CURIE IRI Prefixes:
http://purl.obolibrary.org/obo/NCBITaxon_
,http://purl.bioontology.org/ontology/NCBITAXON/
example AIRR use
“species.id” : “NCBITAXON:9606”
“species.label” : “Homo sapiens”
default root node
label:
Gnathostomata
local ID:
7776
path:
cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Deuterostomia/Chordata/Craniata/Vertebrata/Gnathostomata
license: UMLS
latest release (as of 2020-05-20): 2020-04-18
maintainer: NCBI (info@ncbi.nlm.nih.gov)
NCI thesaurus (NCIT)
used in:
Study type (
study_type
, Study)
CURIE summary
CURIE Prefix:
NCIT
CURIE IRI Prefixes:
http://purl.obolibrary.org/obo/NCIT_
,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#
example AIRR use
“study_type.id” : “NCIT:C15197”
“study_type.label” : “Case-Control Study”
default root node
label:
Study
local ID:
C63536
path:
Activity/Clinical or Research Activity/ Research Activity/Study
license: Public domain, credit of NCI is requested
repo: https://github.com/NCI-Thesaurus/thesaurus-obo-edition
latest release (as of 2020-05-20): 2020-05-04
maintainer: NCI (ncicbiitappssupport@mail.nih.gov)
Units of measurement ontology (UO)
used in:
Age unit (
age_unit
, Subject)
CURIE summary
CURIE Prefix:
UO
CURIE IRI Prefix:
http://purl.obolibrary.org/obo/UO_
example AIRR use
“age_unit.id” : “UO:0000036”
“age_unit.label” : “year”
default root node
label:
time unit
local ID:
UO_0000003
path:
unit/time unit
license: CC BY (per Github repo)
repo: https://github.com/bio-ontology-research-group/unit-ontology
latest release (as of 2020-05-20): 2020-05-18
maintainer: unknown
Uber-anatomy ontology (Uberon)
used in:
Tissue (
tissue
, Sample)
CURIE summary
CURIE Prefix:
UBERON
CURIE IRI Prefix:
http://purl.obolibrary.org/obo/UBERON_
example AIRR use
“tissue.id” : “UBERON:0002371”
“tissue.label” : “bone marrow”
default root node
label:
multicellular anatomical structure
local ID:
UBERON:0010000
path:
/BFO_0000002/BFO_0000004/anatomical entity/material anatomical entity/anatomical structure/multicellular anatomical structure
license: CC BY
latest release (as of 2020-05-20): 2019-11-22
maintainer: Chris Mungall, LBL, CA, US (cjmungall@lbl.gov)
Sprint Reports¶
- RFC3987
Internationalized Resource Identifiers (IRIs). `DOI:10.17487/RFC3987`_