Asynchronous API#
The ADC API is a synchronous API, which means that a query is executed immediately and all data satisfying the query is returned in the response, regardless of the length of time to execute the query and the amount of data returned in the response. The network connection between the client and server is maintained until the request is complete.
Repositories may wish to have better control over queries that either take a long time to run or which return large amounts of data. Partial control is provided by utilizing the max_size limit, which restricts the amount of data returned; however, there is nothing to restrict the number of concurrent queries nor how long an individual query can run. For network services with time restrictions, this can cause timeouts as the client/server connection is cut if queries take too long.
The ADC Asynchronous API is an extension that repositories may implement which provides control over query execution. Queries are performed asynchronously, which means that incoming query requests return immediately with a status record, queries are executed in the future by the repository, and the status record is updated when the query is complete and the data is ready.
Currently only the rearrangement
endpoint is required, but other
endpoints may be added in future versions. The Async API mirrors the ADC
API with the query parameters that are accepted with the exception of
facets
(this may be added in future versions).
Repositories should assign a unique query_id
for every incoming
query request, which will allow the status of the query to be tracked.
Requests may specify a URL for notifications as the query status
changes; however, notifications do not have guaranteed delivery so
client programs should utilize polling as a backup. Repositories should
update the query status and send out notifications, as the query goes
through various processing stages. The current statuses that repositories
must support are listed below.
PENDING: A query which has been received but not yet completely accepted for submission. The repository may perform additional error checks before accepting the query.
SUBMITTED: A query that has passed any initial error checks and is submitted to be executed.
PROCESSING: A query that is currently being processed.
FINISHED: A query that has finished and data is available.
ERROR: A query where an error occurred that prevents completion.
EXPIRED: A query that has finished but has expired and the data is no longer available.
When a query is FINISHED, the data should be made available with a download URL. The download does not have to be through the same API service, which allows repositories to utilize services that are more efficient at downloading large data files.
Async API Endpoints#
The ADC Async API is versioned with the major version number (v1
) as
part of the base path (/airr/async/v1
) for all endpoints. Each endpoint provides specific
functionality as summarized in the following table:
Endpoint |
Type |
HTTP |
Description |
---|---|---|---|
|
Service status |
|
Returns success if API service is running. |
|
Service information |
|
Upon success, returns service information such as name, version, etc. |
|
Query rearrangements asynchronously |
|
Upon success, returns a query status object with a unique |
|
Retrieve query status given its |
|
Upon success, returns the query status information. |
Request Asynchronous Query on Rearrangements#
The rearrangement
endpoint provides access to all fields in
the Rearrangement schema. Unlike
the ADC API which limits the set of queryable fields, any field
can be included in an asynchronous query.
This example queries for rearrangements with a specific junction amino acid sequence among a set of repertoires. This is the exact same query that can be sent to the synchronous ADC API.
curl --data @query1_rearrangement.json -H 'content-type: application/json' https://vdjserver.org/airr/async/v1/rearrangement
The content of the JSON payload
.
{
"filters":{
"op":"and",
"content": [
{
"op":"in",
"content": {
"field":"repertoire_id",
"value":[
"2603354229190496746-242ac113-0001-012",
"2618085967015776746-242ac113-0001-012",
"2633633748627296746-242ac113-0001-012",
"2564613624180576746-242ac113-0001-012"
]
}
},
{
"op":"=",
"content": {
"field":"junction_aa",
"value":"CARDPRSYHAFDIW"
}
}
]
},
"fields":["repertoire_id","sequence_id","v_call","productive"],
"format":"tsv"
}
Here is an example response with the unique query_id
.
{
"message":"rearrangement lrq submitted.",
"query_id":"1978706891589873170-242ac118-0001-012"
}
Retrieve Status of an Asynchronous Query#
The status of an asynchronous query can be retrieved at any time by
sending a request with the query_id
. Here is an example using the
query_id
from the above request.
curl https://vdjserver.org/airr/async/v1/status/1978706891589873170-242ac118-0001-012
Here is an example response:
{
"query_id":"1978706891589873170-242ac118-0001-012",
"endpoint":"rearrangement",
"status":"FINISHED",
"message":null,
"created":"2024-02-25T16:19:07.801-06:00",
"estimated_count":10,
"final_file":"1978706891589873170-242ac118-0001-012.airr.tsv",
"download_url":"https://vdj-agave-api.tacc.utexas.edu/postits/v2/f8d52cdb-9234-41e3-a908-95c9034a3360-010"
}
Retrieve Data for an Asynchronous Query#
When a query is FINISHED, the download_url
field will provide the
URL for retrieving the data. The final_file
field is a repository
assigned file name, usually made unique in some way by using the
query_id
, that is provided for convenience, but clients are allowed
to use whatever filename they wish. Here is an example to download the
data for above query.
curl -o 1978706891589873170-242ac118-0001-012.airr.tsv https://vdj-agave-api.tacc.utexas.edu/postits/v2/f8d52cdb-9234-41e3-a908-95c9034a3360-010
Request Notifications for Query Status Changes#
The client can request that a URL be called by the repository whenever
the status of a query changes. This can only be done with the initial
query request. A notification object is provided with the initial query
request that specifies the URL to be called, whether the GET or POST
method is used, and what status changes to receive. This object is
provided with the notification
parameter. Here is the above example
query request where a notification URL has been added.
{
"filters":{
"op":"and",
"content": [
{
"op":"in",
"content": {
"field":"repertoire_id",
"value":[
"2603354229190496746-242ac113-0001-012",
"2618085967015776746-242ac113-0001-012",
"2633633748627296746-242ac113-0001-012",
"2564613624180576746-242ac113-0001-012"
]
}
},
{
"op":"=",
"content": {
"field":"junction_aa",
"value":"CARDPRSYHAFDIW"
}
}
]
},
"fields":["repertoire_id","sequence_id","v_call","productive"],
"format":"tsv",
"notification":{
"url"": "https://notify.example.com",
"method": "POST"
}
}
When sending a notification, the repository will call the URL using requested method and
provide data with information about the query status change. This data is exactly the
same as what is returned by the status
endpoint.