Scientific Query Scenarios for AIRR Data Commons API#
The AIRR Common Repository Working Group (CRWG) has defined a number of sample scientific query scenarios to guide the design of the ADC API. The Design Decisions document lists the major design choices for the API, and the API is currently defined using the OpenAPI V2.0 Specification. This document describes the query examples with associated JSON definitions that can be submitted to an AIRR repository.
There are two main query endpoints in the API: /repertoire for querying MiAIRR-compliant study metadata and /rearrangement for querying rearrangement annotations. Most scientific queries will involve both endpoints. The basic workflow involves first querying /repertoire to get the list of repertoires that meet the search criteria on study, subject, and sample metadata. Secondly, the identifiers from the repertoires in the first query are passed to the /rearrangement endpoint along with any search criteria on the rearrangement annotations. The resultant rearrangements can be downloaded as JSON or in the AIRR TSV format.
Query Example 1#
What human full length TCR-beta sequences have junction amino acid sequence: “CASSYIKLN”?#
The
JSON query definition
for /repertoire endpoint. The ontology identifier9606
requests human andTRB
is the locus of interest.
{
"filters":{
"op":"and",
"content": [
{
"op":"=",
"content": {
"field":"subject.organism.id",
"value":"9606"
}
},
{
"op":"=",
"content": {
"field":"sample.pcr_target.pcr_target_locus",
"value":"TRB"
}
}
]
}
}
That query does not request full length sequences. We can enhance the
query
by adding a clause for thesample.complete_sequences
field.
{
"filters":{
"op":"and",
"content": [{
"op":"=",
"content": {
"field":"subject.species.id",
"value":"NCBITAXON:9606"
}
},
{
"op":"=",
"content": {
"field":"sample.pcr_target.pcr_target_locus",
"value":"TRB"
}
},
{
"op":"or",
"content":[{
"op":"=",
"content": {
"field":"sample.complete_sequences",
"value":"complete"
}
},
{
"op":"=",
"content": {
"field":"sample.complete_sequences",
"value":"complete+untemplated"
}
}]
}]
}
}
The
JSON query definition
for /rearrangement endpoint. The repertoire identifiers (repertoire_id
) in the query are just examples, you would replace them with the actual identifiers returned from the above repertoire query. The query performs an exact match of the junction amino acid sequence.
{
"filters":{
"op":"and",
"content": [
{
"op":"in",
"content": {
"field":"repertoire_id",
"value":[
"2603354229190496746-242ac113-0001-012",
"2618085967015776746-242ac113-0001-012",
"2633633748627296746-242ac113-0001-012",
"2564613624180576746-242ac113-0001-012"
]
}
},
{
"op":"=",
"content": {
"field":"junction_aa",
"value":"CARDPRSYHAFDIW"
}
}
]
},
"fields":["repertoire_id","sequence_id","v_call","productive"],
"format":"tsv"
}
Query Example 2#
What human full length IgH sequences have been found in patients with an autoimmune diagnosis.
TO BE WRITTEN
Query Example 3#
What is the antibody IG heavy chain V usage in people who have diabetes?
TO BE WRITTEN
Query Example 4#
Give me all the anti-HIV antibody sequences that use IGHV1-69 in HIV infected individuals?
TO BE WRITTEN
Query Example 5#
Repertoires from cancer patients where we have pre- and post-immunotherapy peripheral blood (or tumor biopsy).
TO BE WRITTEN
Query Example 6#
Return TCRs that score highly on a position weight matrix from subjects with a particular HLA allele that have been infected with TB.
TO BE WRITTEN
Query Example 7#
Repertoires from female patients with cancer.
TO BE WRITTEN