AIRR Data Commons API V1¶

The use of high-throughput sequencing for profiling B-cell and T-cell receptors has resulted in a rapid increase in data generation. It is timely, therefore, for the Adaptive Immune Receptor Repertoire (AIRR) community to establish a clear set of community-accepted data and metadata standards; analytical tools; and policies and practices for infrastructure to support data deposit, curation, storage, and use. Such actions are in accordance with international funder and journal policies that promote data deposition and data sharing – at a minimum, data on which scientific publications are based should be made available immediately on publication. Data deposit in publicly accessible databases ensures that published results may be validated. Such deposition also facilitates reuse of data for the generation of new hypotheses and new knowledge.

The AIRR Common Repository Working Group (CRWG) developed a set of recommendations (v0.6.0) that promote the deposit, sharing, and use of AIRR sequence data. These recommendations were refined following community discussions at the AIRR 2016 and 2017 Community Meetings and were approved through a vote by the AIRR Community at the AIRR Community Meeting in December 2017.

Overview¶

The AIRR Data Commons (ADC) API provides programmatic access to query and download AIRR-seq data. The ADC API uses JSON as its communication format, and standard HTTP methods like GET and POST. The ADC API is read-only and the mechanism of inclusion of AIRR-seq studies into a data repository is left up to the repository.

This documentation explains how to construct and execute API requests and interpret API responses.

API Endpoints

The ADC API is versioned with the version number (v1) as part of the base path for all endpoints. Each ADC API endpoint represents specific functionality as summarized in the following table:

Endpoint	Type	HTTP	Description
`/v1`	Service status	`GET`	Returns success if API service is running.
`/v1/info`	Service information	`GET`	Upon success, returns service information such as name, version, etc.
`/v1/repertoire/{repertoire_id}`	Retrieve a repertoire given its `repertoire_id`	`GET`	Upon success, returns the `Repertoire` information in JSON according to the Repertoire schema.
`/v1/repertoire`	Query repertoires	`POST`	Upon success, returns a list of `Repertoires` in JSON according to the Repertoire schema.
`/v1/rearrangement/{sequence_id}`	Retrieve a rearrangement given its `sequence_id`	`GET`	Upon success, returns the `Rearrangement` information in JSON format according to the Rearrangement schema.
`/v1/rearrangement`	Query rearrangements	`POST`	Upon success, returns a list of `Rearrangements` in JSON or AIRR TSV format according to the Rearrangement schema.

Authentication

The ADC API currently does not define an authentication method. Future versions of the API will provide an authentication method so data repositories can support query and download of controlled-access data.

Search and Retrieval¶

The AIRR Data Commons API specifies endpoints for searching and retrieving AIRR-seq data sets stored in an AIRR-compliant Data Repository according to the AIRR Data Model. This documentation describes Version 1 of the API. The general format of requests and associated parameters are described below.

The design of the AIRR Data Commons API was greatly inspired by National Cancer Institute’s Genomic Data Commons (GDC) API.

Components of a Request¶

The ADC API has two classes of endpoints. The endpoints that respond to GET requests are simple services that require few or no parameters. While, the endpoints that response to POST requests are the main query services and provide many parameters for specifying the query as well as the data in the API response.

A typical POST query request specifies the following parameters:

The filters parameter specifies the query.
The from and size parameters specify the number of results to skip and the maximum number of results to be returned in the response.
The fields parameter specifies which data elements to be returned in the response. By default all fields (AIRR and non-AIRR) stored in the data repository are returned. This can vary between data repositories based upon how the repository decides to store blank or null fields, so the fields and/or include_fields parameter should be used to guarantee the existence of data elements in the response.
The include_fields parameter specifies the set of AIRR fields to be included in the response. This parameter can be used in conjunction with the fields parameter, in which case the list of fields is merged. This is a mechanism to ensure that specific, well-defined sets of AIRR data elements are returned without requiring all of those fields to be individually provided in the fields parameter.

The sets that can be requested are summarized in the table below.

include_fields	MiAIRR	AIRR required	AIRR identifiers	other AIRR fields
miairr	Y	some	N	N
airr-core	Y	Y	Y	N
airr-schema	Y	Y	Y	Y

Service Status Example

The following is an example GET request to check that the service API is available for VDJServer’s data repository.

curl https://vdjserver.org/airr/v1

The response should indicate success.

{"result":"success"}

Service Info Example

The following is an example GET request to get information about the service.

curl https://vdjserver.org/airr/v1

The response provides various information.

{
  "name": "adc-api-js-mongodb",
  "description": "AIRR Data Commons API reference implementation",
  "version": "1.0.0",
  "airr_schema_version": 1.3,
  "max_size": 1000,
  "max_query_size": 2097152,
  "contact": {
    "name": "AIRR Community",
    "url": "https://github.com/airr-community"
  }
}

Query Repertoire Example

The following is an example POST request to the repertoire endpoint of the ADC API. It queries for repertoires of human TCR beta receptors (filters), skips the first 10 results (from), requests 5 results (size), and requests only the repertoire_id field (fields).

curl --data @query1-2_repertoire.json https://vdjserver.org/airr/v1/repertoire

The content of the JSON payload.

{
    "filters":{
        "op":"and",
        "content": [
            {
                "op":"=",
                "content": {
                    "field":"subject.organism.id",
                    "value":"9606"
                }
	    },
	    {
                "op":"=",
                "content": {
                    "field":"sample.pcr_target.pcr_target_locus",
                    "value":"TRB"
                }
	    }
	]
    },
    "from":10,
    "size":5,
    "fields":["repertoire_id"]
}

The response contains two JSON objects, an Info object that provides information about the API response and a Repertoire object that contains the list of Repertoires that met the query search criteria. In this case, the query returns a list of five repertoire identifiers. Note the Info object is based on the info block as specified in the OpenAPI v2.0 specification.

{
  "Info":
  {
      "title": "AIRR Data Commons API reference implementation",
      "description": "API response for repertoire query",
      "version": 1.3,
      "contact":
      {
          "name": "AIRR Community",
          "url": "https://github.com/airr-community"
      }
  },
  "Repertoire":
  [
      {"repertoire_id": "4357957907784536551-242ac11c-0001-012"},
      {"repertoire_id": "4476756703191896551-242ac11c-0001-012"},
      {"repertoire_id": "6205695788196696551-242ac11c-0001-012"},
      {"repertoire_id": "6393557657723736551-242ac11c-0001-012"},
      {"repertoire_id": "7158276584776536551-242ac11c-0001-012"}
  ]
}

Endpoints¶

The ADC API V1 provides two primary endpoints for querying and retrieving AIRR-seq data. The repertoire endpoint allows querying upon any field in the Repertoire schema including study, subject, sample, cell processing, nucleic acid processing, sequencing run, raw sequencing files, and data processing information. Queries on the content of raw sequencing files is not support but is supported on file attributes such as name, type and read information. Queries on Rearrangements is provided by the rearrangement endpoint.

The standard workflow to retrieve all of the data for an AIRR-seq study involves performing a query on the repertoire endpoint to retrieve the repertoires in the study, and one or more queries on the rearrangement endpoint to download the rearrangement data for each repertoire. The endpoints are designed so the API response can be saved directly into a file and be used by AIRR analysis tools, including the AIRR python and R reference libraries, without requiring modifications or transformation of the data.

Repertoire Endpoint

The repertoire endpoint provides access to all fields in the Repertoire schema. There are two type of endpoints; one for retrieving a single repertoire given its identifier, and another for performing a query across all repertoires in the data repository.

It is expected that the number of repertoires in a data repository will never become so large such that queries become computationally expensive. A data repository might have thousands of repertoires across hundreds of studies, yet such numbers are easily handled by modern databases. Based upon this, the ADC API does not place limits on the repertoire endpoint for the fields that can be queried, the operators that can be used, or the number of results that can be returned.

Retrieve a Single Repertoire

Given a repertoire_id, a single Repertoire object will be returned.

curl https://vdjserver.org/airr/v1/repertoire/4357957907784536551-242ac11c-0001-012

The response will provide the Repertoire data in JSON format.

{
  "Info":
  {
      "title": "AIRR Data Commons API reference implementation",
      "description": "API response for repertoire query",
      "version": 1.3,
      "contact":
      {
          "name": "AIRR Community",
          "url": "https://github.com/airr-community"
      }
  },
  "Repertoire":
  [
    {
      "repertoire_id":"4357957907784536551-242ac11c-0001-012",
      "study":{
         "study_id":"PRJNA300878",
         "submitted_by":"Florian Rubelt",
         "pub_ids":"PMID:27005435",
         "lab_name":"Mark M. Davis",
         "lab_address":"Stanford University",
         "study_title":"Homo sapiens B and T cell repertoire - MZ twins"
      },
      "subject":{
         "subject_id":"TW02A",
         "synthetic":false,
         "linked_subjects":"TW02B",
         "organism":{"id":"9606","value":"Homo sapiens"},
         "age":"25yr",
         "link_type":"twin",
         "sex":"F"
      },
      "sample":[
        {"sample_id":"TW02A_T_memory_CD4",
         "pcr_target":[{"pcr_target_locus":"TRB"}],
         "cell_isolation":"FACS",
         "read_length":"300",
         "cell_phenotype":"expression of CD45RO and CCR7",
         "cell_subset":"Memory CD4+ T cell",
         "filename":"SRR2905669_R1.fastq.gz",
         "single_cell":false,
         "file_type":"fastq",
         "tissue":"PBMC",
         "template_class":"RNA",
         "paired_filename":"SRR2905669_R2.fastq.gz",
         "paired_read_direction":"reverse",
         "read_direction":"forward",
         "sequencing_platform":"Illumina MiSeq"}
      ],
      "data_processing":[
        {"data_processing_id":"4976322832749171176-242ac11c-0001-012",
         "analysis_provenance_id":"651223970338378216-242ac11b-0001-007"}
      ]
    }
  ]
}

Query against all Repertoires

A query in JSON format is passed in a POST request. This example queries for repertoires of human IG heavy chain receptors for all studies in the data repository.

curl --data @query2_repertoire.json https://vdjserver.org/airr/v1/repertoire

The content of the JSON payload.

{
    "filters":{
        "op":"and",
        "content": [
            {
                "op":"=",
                "content": {
                    "field":"subject.organism.id",
                    "value":"9606"
                }
	    },
	    {
                "op":"=",
                "content": {
                    "field":"sample.pcr_target.pcr_target_locus",
                    "value":"IGH"
                }
	    }
	]
    }
}

The response will provide a list of Repertoires in JSON format. The example output is not provided here due to its size.

Rearrangement Endpoint

The rearrangement endpoint provides access to all fields in the Rearrangement schema. There are two type of endpoints; one for retrieving a single rearrangement given its identifier, and another for performing a query across all rearrangements in the data repository.

Unlike repertoire data, data repositories are expected to store millions or billions of rearrangement records, where performing “simple” queries can quickly become computationally expensive. Data repositories will need to optimize their databases for performance. Therefore, the ADC API does not require that all fields be queryable and only a limited set of query capabilities must be supported. The queryable fields are described in the Fields section below.

Retrieve a Single Rearrangement

Given a sequence_id, a single Rearrangement object will be returned.

curl https://vdjserver.org/airr/v1/rearrangement/5d6fba725dca5569326aa104

The response will provide the Rearrangement data in JSON format.

{
  "Info":
  {
      "title": "AIRR Data Commons API reference implementation",
      "description": "API response for rearrangement query",
      "version": 1.3,
      "contact":
      {
          "name": "AIRR Community",
          "url": "https://github.com/airr-community"
      }
  },
  "Rearrangement":
  [
    {
      "sequence_id":"5d6fba725dca5569326aa104",
      "repertoire_id":"1841923116114776551-242ac11c-0001-012",

      "... remaining fields":"snipped for space"
    }
  ]
}

Query against all Rearrangements

Supplying a repertoire_id, when it is known, should greatly speed up the query as it can significantly reduce the amount of data to be searched, though it isn’t necessary.

This example queries for rearrangements with a specific junction amino acid sequence among a set of repertoires. A limited set of fields is requested to be returned. The resultant data can be requested in JSON or AIRR TSV format.

curl --data @query1_rearrangement.json https://vdjserver.org/airr/v1/rearrangement

The content of the JSON payload.

{
    "filters":{
        "op":"and",
        "content": [
            {
                "op":"in",
                "content": {
                    "field":"repertoire_id",
                    "value":[
                        "2366080924918616551-242ac11c-0001-012",
                        "2541616238306136551-242ac11c-0001-012",
                        "1993707260355416551-242ac11c-0001-012",
                        "1841923116114776551-242ac11c-0001-012"
                    ]
                }
            },
            {
                "op":"=",
                "content": {
                    "field":"junction_aa",
                    "value":"CARDPRSYHAFDIW"
                }
            }
        ]
    },
    "fields":["repertoire_id","sequence_id","v_call","productive"],
    "format":"tsv"
}

Here is the response in AIRR TSV format.

productive    v_call  sequence_id     repertoire_id
true  IGHV1-69*04     5d6fba725dca5569326aa106        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba725dca5569326aa11b        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*10     5d6fba725dca5569326aa149        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba735dca5569326aa245        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba735dca5569326aa274        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba735dca5569326aa27b        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba735dca5569326aa27c        1841923116114776551-242ac11c-0001-012
true  IGHV1-24*01     5d6fba735dca5569326aa2a0        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba745dca5569326aa359        1841923116114776551-242ac11c-0001-012
true  IGHV1-69*04     5d6fba745dca5569326aa408        1841923116114776551-242ac11c-0001-012

Request Parameters¶

The ADC API supports the follow query parameters. These are only applicable to the repertoire and rearrangement query endpoints, i.e. the HTTP POST endpoints.

Parameter	Default	Description
`filters`	null	Specifies logical expression for query critieria
`format`	JSON	Specifies the API response format: JSON, AIRR TSV
`include_fields`	null	Specifies the set of AIRR fields to be included in the response
`fields`	null	Specifies which fields to include in the response
`from`	0	Specifies the first record to return from a set of search results
`size`	repository dependent	Specifies the number of results to return
`facets`	null	Provide aggregate count information for the specified fields

Filters Query Parameter

The filters parameter enables passing complex query criteria to the ADC API. The parameter represents the query in a JSON object.

A filters query consists of an operator (or a nested set of operators) with a set of field and value operands. The query criteria as represented in a JSON object can be considered an expression tree data structure where internal nodes are operators and child nodes are operands. The expression tree can be of any depth, and recursive algorithms are typically used for tree traversal.

The following operators are support by the ADC API.

Operator	Operands	Value Data Types	Description	Example
=	field and value	string, number, integer, or boolean	equals	{“op”:”=”,”content”:{“field”:”junction_aa”,”value”:”CASSYIKLN”}}
!=	field and value	string, number, integer, or boolean	does not equal	{“op”:”!=”,”content”:{“field”:”subject.organism.id”,”value”:”9606”}}
<	field and value	number, integer	less than	{“op”:”<”,”content”:{“field”:”sample.cell_number”,”value”:1000}}
<=	field and value	number, integer	less than or equal	{“op”:”<=”,”content”:{“field”:”sample.cell_number”,”value”:1000}}
>	field and value	number, integer	greater than	{“op”:”>”,”content”:{“field”:”sample.cells_per_reaction”,”value”:10000}}
>=	field and value	number, integer	greater than or equal	{“op”:”>=”,”content”:{“field”:”sample.cells_per_reaction”,”value”:10000}}
is missing	field	n/a	field is missing or is null	{“op”:”is missing”,”content”:{“field”:”sample.tissue”}}
is	field	n/a	identical to “is missing” operator, provided for GDC compatibility	{“op”:”is”,”content”:{“field”:”sample.tissue”}}
is not missing	field	n/a	field is not missing and is not null	{“op”:”is not missing”,”content”:{“field”:”sample.tissue”}}
not	field	n/a	identical to “is not missing” operator, provided for GDC compatibility	{“op”:”not”,”content”:{“field”:”sample.tissue”}}
in	field, multiple values in a list	array of string, number, or integer	matches a string or number in a list	{“op”:”in”,”content”:{“field”:”subject.strain_name”,”value”:[“C57BL/6”,”BALB/c”,”NOD”]}}
exclude	field, multiple values in a list	array of string, number, or integer	does not match any string or number in a list	{“op”:”exclude”,”content”:{“field”:”subject.strain_name”,”value”:[“SCID”,”NOD”]}}
contains	field, value	string	contains the substring	{“op”:”contains”,”content”:{“field”:”study.study_title”,”value”:”cancer”}}
and	multiple operators	n/a	logical AND	{“op”:”and”,”content”:[ {“op”:”!=”,”content”:{“field”:”subject.organism.id”,”value”:”9606”}}, {“op”:”>=”,”content”:{“field”:”sample.cells_per_reaction”,”value”:10000}}, {“op”:”exclude”,”content”:{“field”:”subject.strain_name”,”value”:[“SCID”,”NOD”]}} ]}
or	multiple operators	n/a	logical OR	{“op”:”and”,”content”:[ {“op”:”<”,”content”:{“field”:”sample.cell_number”,”value”:1000}}, {“op”:”is missing”,”content”:{“field”:”sample.tissue”}}, {“op”:”exclude”,”content”:{“field”:”subject.organism.id”,”value”:[“9606”,”10090”]}} ]}

Note that the not operator is different from a logical NOT operator, and the logical NOT is not needed as the other operators provide negation.

The field operand specifies a fully qualified property name in the AIRR Data Model. Fully qualified AIRR properties are either a JSON/YAML base type (string, number, integer, or boolean) or an array of one of these base types (some AIRR fields are arrays e.g. study.keywords_study). The Fields section below describes the available queryable fields.

The value operand specifies one or more values when evaluating the operator for the field operand.

Queries Against Arrays

A number of fields in the AIRR Data Model are arrays, such as study.keywords_study which is an array of strings or subject.diagnosis which is an array of Diagnosis objects. A query operator on an array field will apply that operator to each entry in the array to decide if the query filter is satisfied. The behavior is different for various operators. For operators such as = and in, the filter behaves like the Boolean OR over the array entries, that is if any array entry evaluates to true then the query filter is satisfied. For operators such as != and exclude, the filter behaves like the Boolean AND over the array entries, that is all array entries must evaluate to true for the query filter to be satisfied.

Examples

A simple query with a single operator looks like this:

{
  "filters": {
    "op":"=",
    "content": {
      "field":"junction_aa",
      "value":"CASSYIKLN"
    }
  }
}

A more complex query with multiple operators looks like this:

{
  "filters": {
    "op":"and",
    "content": [
      {
        "op":"!=",
        "content": {
          "field":"subject.organism.id",
          "value":"9606"
        }
      },
      {
        "op":">=",
        "content": {
          "field":"sample.cells_per_reaction",
          "value":"10000"
        }
      },
      {
        "op":"exclude",
        "content": {
          "field":"subject.organism.id",
          "value": ["9606", "10090"]
        }
      }
    ]
  }
}

Format Query Parameter

Specifies the format of the API response. json is the default format and is available for all endpoints. The rearrangement POST endpoint also accepts tsv which will provide the data in the AIRR TSV format. A specific ordering of fields in the TSV format should not be assumed from one API request to another. Take care to properly merge AIRR TSV data from multiple API requests, e.g. such as with the airr-tools merge program.

Fields Query Parameter

The fields parameter specifies which fields are to be included in the API response. By default all fields (AIRR and non-AIRR) stored in the data repository are returned. However, this can vary between data repositories based upon how the repository decides to store blank or null fields, so the fields and/or include_fields parameter should be used to guarantee the existence of data elements in the response.

Include Fields Query Parameter

The include_fields parameter specifies that the API response should include a well-defined set of AIRR Standard fields. These sets include:

miairr, for only the MiAIRR fields.
airr-core, for the AIRR required and identifier fields. This is expected to be the most common option as it provides all MiAIRR fields, additional required fields useful for analysis, and all identifier fields for linking objects in the AIRR Data Model.
airr-schema, for all AIRR fields in the AIRR Schema.

The include_fields parameter is a mechanism to ensure that specific AIRR data elements are returned without requiring those fields to be individually provided with the fields parameter. Any data elements that lack a value will be assigned null in the response. Any empty array of objects, for example subject.diagnosis, will be populated with a single object with all of the object’s properties given a null value. Any empty array of primitive data types, like string or number, will be assigned null. Note that if both the include_fields and the fields parameter are provided, the API response will include the set of AIRR fields and in addition will include any additional fields that are specified in the fields parameter.

Size and From Query Parameters

The ADC API provides a pagination feature that limits the number of results returned by the API.

The from query parameter specifies which record to start from when returning results. This allows records to be skipped. The default value is 0 indicating that the first record in the set of results will be returned.

The size query parameters specifies the maximum number of results to return. The default value is specific to the data repository, and a maximum value may be imposed by the data repository. This is to prevent queries from “accidently” returning millions of records. The info endpoint provides the data repository default and maximum values for the repertoire and rearrangement endpoints, which may have different values. A value of 0 indicates there is no limit on the number of results to return, but if the data repository does not support this then the default value will be used.

The combination of from and size can be used to implement pagination in a graphical user interface, or to split a very large download into smaller batches. For example, if an interface displays 10 records as a time, the request would assign size=10 and from=0 to get the ten results to display on the first page. When the user traverses to the “next page”, the request would assign from=10 to skip the first ten results and return the next ten results, and from=20 for the next page after that, and so on.

Facets Query Parameter

The facets parameter provides aggregate count information for the specified field. Only a single field can be specified. The facets parameter can be used in conjunction with the filters parameter to get aggregate counts for a set of search results. It returns the set of values for the field, and the number of records (repertoires or rearrangement) that have this value. For field values that have no counts, the API service can either return the field value with a 0 count or exclude the field value in the aggregation. The typical use of this parameter is for displaying aggregate information in a graphical user interface.

Here is a simple query with only the facets parameter to return the set of values for sample.pcr_target.pcr_target_locus and the count of repertoires repertoires that have each value. The content of the JSON payload.

{
    "facets":"sample.pcr_target.pcr_target_locus"
}

Sending this query in an API request.

curl --data @facets1_repertoire.json https://vdjserver.org/airr/v1/repertoire

The output from the request is similar to normal queries except the data is provided with the Facet key.

{
  "Info": {
    "title": "AIRR Data Commons API reference implementation",
    "description": "API response for repertoire query",
    "version": 1.3,
    "contact": {
      "name": "AIRR Community",
      "url": "https://github.com/airr-community"
    }
  },
  "Facet": [
    {"sample.pcr_target.pcr_target_locus":[["TRB"]],"count":40},
    {"sample.pcr_target.pcr_target_locus":[["IGH"]],"count":20}
  ]
}

Here is a query with both filters and facets parameters, which restricts the data records used for the facets count. The content of the JSON payload.

{
    "filters":{
        "op":"=",
        "content": {
            "field":"sample.pcr_target.pcr_target_locus",
            "value":"IGH"
        }
    },
    "facets":"subject.subject_id"
}

Sending this query in an API request.

curl --data @facets2_repertoire.json https://vdjserver.org/airr/v1/repertoire

Example output from the request. This result indicates there are ten subjects each with two IGH repertoires.

{
  "Info": {
    "title": "AIRR Data Commons API reference implementation",
    "description": "API response for repertoire query",
    "version": 1.3,
    "contact": {
      "name": "AIRR Community",
      "url": "https://github.com/airr-community"
    }
  },
  "Facet": [
    {"subject.subject_id":"TW05B","count":2},
    {"subject.subject_id":"TW05A","count":2},
    {"subject.subject_id":"TW03A","count":2},
    {"subject.subject_id":"TW04A","count":2},
    {"subject.subject_id":"TW01A","count":2},
    {"subject.subject_id":"TW04B","count":2},
    {"subject.subject_id":"TW02A","count":2},
    {"subject.subject_id":"TW03B","count":2},
    {"subject.subject_id":"TW01B","count":2},
    {"subject.subject_id":"TW02B","count":2}
  ]
}

ADC API Limits and Thresholds¶

Repertoire endpoint query fields

It is expected that the number of repertoires in a data repository will never become so large such that queries become computationally expensive. A data repository might have thousands of repertoires across hundreds of studies, yet such numbers are easily handled by databases. Based upon this, the ADC API does not place limits on the repertoire endpoint for the fields that can be queried or the operators that can be used.

Rearrangement endpoint query fields

Unlike repertoire data, data repositories are expected to store billions of rearrangement records, where performing “simple” queries can quickly become computationally expensive. Data repositories are encouraged to optimize their databases for performance. Therefore, based upon a set of query use cases provided by immunology experts, a minimal set of required fields was defined that can be queried. These required fields are described in the following Table. The fields also have the AIRR extension property adc-query-support: true in the AIRR Schema.

Field(s)	Description
sequence_id, repertoire_id, sample_processing_id, data_processing_id, clone_id, cell_id	Identifiers; sequence_id allows for query of that specific rearrangement object in the repository, while repertoire_id, sample_processing_id, and data_processing_id are links to the repertoire metadata for the rearrangement. The clone_id and cell_id are identifiers that group rearrangements based on clone assignment and single cell assignment.
locus, v_call, d_call, j_call, c_call, productive, junction_aa, junction_aa_length	Commonly used rearrangement annotations.

Data repository specific limits

A data repository may impose limits on the size of the data returned. This might be because of limitations imposed by the back-end database being used or because of the need to manage the load placed on the server. For example, MongoDB databases have document size limits (16 megabytes) which limit the size of a query that can be sent to a repository and the size of a single repertoire or rearrangement object that is returned. As a result a repository might choose to set a maximum query size.

Size limits can be retrieved from the info endpoint. If the data repository does not provide a limit, then no limit is assumed.

Field	Description
`max_size`	The maximum value for the `size` query parameter. Attempting to retrieve data beyond this maximum should trigger an error response. The error response should include information about why the query failed and what the maximum size limit is.
`max_query_size`	The maximum size of the JSON query object.

Reference Implementation¶

The AIRR Community provides a reference implementation for an ADC API service with more information found here.

AIRR Standards 1.3 documentation

AIRR Data Commons API V1¶

Overview¶

Search and Retrieval¶

Components of a Request¶

Endpoints¶

Request Parameters¶

ADC API Limits and Thresholds¶

Reference Implementation¶