Guide for submission of AIRR-seq data to NCBI

Guide for submission of AIRR-seq data to NCBI#

This site provides a detailed “how-to” guide for submission of AIRR-seq data to NCBI repositories (BioProject, BioSample, SRA and GenBank). For other implementations of the MiAIRR standard see here.

One of the primary initiatives of the AIRR (Adaptive Immune Receptor Repertoire) Community has been to develop a set of metadata standards for the submission of immune receptor repertoire sequencing datasets. This work has been carried out by the AIRR Community Standards Working Group. In order to support reproducibility, standard quality control, and data deposition in a common repository, the AIRR Community has agreed to six high-level data sets that will guide the publication, curation and sharing of AIRR-Seq data and metadata: Study and subject, sample collection, sample processing and sequencing, raw sequences, processing of sequence data, and processed AIRR sequences. The detailed data elements within these sets are defined here (Download as TSV). The association between these AIRR sets, the associated data elements, and each of the NCBI repositories is shown below:

../_images/MiAIRR_data_elements_NCBI_targets.png

Submission of AIRR sequencing data and metadata to NCBI’s public data repositories consists of five sequential steps:

  1. Submit study information to NCBI BioProject using the NCBI web interface.

  2. Submit sample-level information to the NCBI BioSample repository using the AIRR-BioSample templates.

  3. Submit raw sequencing data to NCBI SRA using the AIRR-SRA data templates.

  4. Generate a DOI for the protocol describing how raw sequencing data were processed using Zenodo.

  5. Submit processed sequencing data with sequence-level annotations to GenBank using AIRR feature tags.

The submission manual provides step-by-step instructions on carrying out these steps for an AIRR study submission.