Scientific Query Scenarios for AIRR Data Commons API#

The AIRR Common Repository Working Group (CRWG) has defined a number of sample scientific query scenarios to guide the design of the ADC API. The Design Decisions document lists the major design choices for the API, and the API is currently defined using the OpenAPI V2.0 Specification. This document describes the query examples with associated JSON definitions that can be submitted to an AIRR repository.

There are two main query endpoints in the API: /repertoire for querying MiAIRR-compliant study metadata and /rearrangement for querying rearrangement annotations. Most scientific queries will involve both endpoints. The basic workflow involves first querying /repertoire to get the list of repertoires that meet the search criteria on study, subject, and sample metadata. Secondly, the identifiers from the repertoires in the first query are passed to the /rearrangement endpoint along with any search criteria on the rearrangement annotations. The resultant rearrangements can be downloaded as JSON or in the AIRR TSV format.

Query Example#

What human full length TCR-beta sequences have junction amino acid sequence: “CASSYIKLN”?#

  • The JSON query definition for /repertoire endpoint. The ontology identifier 9606 requests human and TRB is the locus of interest.

{
    "filters":{
        "op":"and",
        "content": [
            {
                "op":"=",
                "content": {
                    "field":"subject.organism.id",
                    "value":"9606"
                }
	    },
	    {
                "op":"=",
                "content": {
                    "field":"sample.pcr_target.pcr_target_locus",
                    "value":"TRB"
                }
	    }
	]
    }
}
  • That query does not request full length sequences. We can enhance the query by adding a clause for the sample.complete_sequences field.

{
    "filters":{
        "op":"and",
        "content": [{
            "op":"=",
            "content": {
                "field":"subject.species.id",
                "value":"NCBITAXON:9606"
            }
        },
        {
            "op":"=",
            "content": {
                "field":"sample.pcr_target.pcr_target_locus",
                "value":"TRB"
            }
        },
        {
            "op":"or",
            "content":[{
                "op":"=",
                "content": {
                    "field":"sample.complete_sequences",
                    "value":"complete"
                }
            },
            {
                "op":"=",
                "content": {
                    "field":"sample.complete_sequences",
                    "value":"complete+untemplated"
                }
            }]
        }]
    }
}
  • The JSON query definition for /rearrangement endpoint. The repertoire identifiers (repertoire_id) in the query are just examples, you would replace them with the actual identifiers returned from the above repertoire query. The query performs an exact match of the junction amino acid sequence.

{
    "filters":{
        "op":"and",
        "content": [
            {
                "op":"in",
                "content": {
                    "field":"repertoire_id",
                    "value":[
                        "2603354229190496746-242ac113-0001-012",
                        "2618085967015776746-242ac113-0001-012",
                        "2633633748627296746-242ac113-0001-012",
                        "2564613624180576746-242ac113-0001-012"
                    ]
                }
            },
            {
                "op":"=",
                "content": {
                    "field":"junction_aa",
                    "value":"CARDPRSYHAFDIW"
                }
            }
        ]
    },
    "fields":["repertoire_id","sequence_id","v_call","productive"],
    "format":"tsv"
}