Set	Subset	Designation	Field	Type	Format	Level	Definition	Example
1	study	Study ID	study_id	string	free text	important	"Unique ID assigned by study registry such as one of the International Nucleotide Sequence Database Collaboration (INSDC) repositories.
"	PRJNA001
1	study	Study title	study_title	string	free text	important	Descriptive study title	Effects of sun light exposure of the Treg repertoire
1	study	Study type	study_type	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCIT:C63536, label: Study}}	important	Type of study design	id: NCIT:C15197, label: Case-Control Study
1	study	Study inclusion/exclusion criteria	inclusion_exclusion_criteria	string	free text	important	List of criteria for inclusion/exclusion for the study	Include: Clinical P. falciparum infection; Exclude: Seropositive for HIV
1	study	Grant funding agency	grants	string	free text	important	Funding agencies and grant numbers	NIH, award number R01GM987654
1	study	Contact information (data collection)	collected_by	string	free text	important	"Full contact information of the data collector, i.e. the person who is legally responsible for data collection and release. This should include an e-mail address and a persistent identifier such as an ORCID ID.
"	Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097
1	study	Lab name	lab_name	string	free text	important	Department of data collector	Department for Planar Immunology
1	study	Lab address	lab_address	string	free text	important	Institution and institutional address of data collector	School of Medicine, Unseen University, Ankh-Morpork, Disk World
1	study	Contact information (data deposition)	submitted_by	string	free text	important	"Full contact information of the data depositor, i.e., the person submitting the data to a repository. This should include an e-mail address and a persistent identifier such as an ORCID ID. This is supposed to be a short-lived and technical role until the submission is relased.
"	Adrian Turnipseed, a.turnipseed@unseenu.edu, https://orcid.org/0000-0002-1825-0097
1	study	Relevant publications	pub_ids	string	free text	important	"Publications describing the rationale and/or outcome of the study. Where ever possible, a persistent identifier should be used such as a DOI or a Pubmed ID
"	PMID:85642
1	study	Keywords for study	keywords_study	array of string		important	"Keywords describing properties of one or more data sets in a study. ""contains_schema"" keywords indicate that the study contains data objects from the AIRR Schema of that type (Rearrangement, Clone, Cell, Receptor) while the other keywords indicate that the study design considers the type of data indicated (e.g. it is possible to have a study that ""contains_paired_chain"" but does not ""contains_schema_cell"").
"	['contains_ig', 'contains_schema_rearrangement', 'contains_schema_clone', 'contains_schema_cell']
1	subject	Subject ID	subject_id	string	free text	important	"Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used.
"	SUB856413
1	subject	Synthetic library	synthetic	boolean	true | false	essential	TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display)	
1	subject	Organism	species	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCBITAXON:7776, label: Gnathostomata}}	essential	Binomial designation of subject's species	id: NCBITAXON:9606, label: Homo sapiens
1	subject	Sex	sex	string	free text	important	Biological sex of subject	female
1	subject	Age minimum	age_min	number	positive number	important	Specific age or lower boundary of age range.	60
1	subject	Age maximum	age_max	number	positive number	important	"Upper boundary of age range or equal to age_min for specific age. This field should only be null if age_min is null.
"	80
1	subject	Age unit	age_unit	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: UO:0000003, label: time unit}}	important	Unit of age range	id: UO:0000036, label: year
1	subject	Age event	age_event	string	free text	important	"Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity.
"	enrollment
1	subject	Ancestry population	ancestry_population	string	free text	important	Broad geographic origin of ancestry (continent)	list of continents, mixed or unknown
1	subject	Ethnicity	ethnicity	string	free text	important	Ethnic group of subject (defined as cultural/language-based membership)	English, Kurds, Manchu, Yakuts (and other fields from Wikipedia)
1	subject	Race	race	string	free text	important	Racial group of subject (as defined by NIH)	White, American Indian or Alaska Native, Black, Asian, Native Hawaiian or Other Pacific Islander, Other
1	subject	Strain name	strain_name	string	free text	important	Non-human designation of the strain or breed of animal used	C57BL/6J
1	subject	Relation to other subjects	linked_subjects	string	free text	important	Subject ID to which `Relation type` refers	SUB1355648
1	subject	Relation type	link_type	string	free text	important	Relation between subject and `linked_subjects`, can be genetic or environmental (e.g.exposure)	father, daughter, household
1	diagnosis and intervention	Study group description	study_group_description	string	free text	important	Designation of study arm to which the subject is assigned to	control
1	diagnosis and intervention	Diagnosis	disease_diagnosis	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: DOID:4, label: disease}}	important	Diagnosis of subject	id: DOID:9538, label: multiple myeloma
1	diagnosis and intervention	Length of disease	disease_length	string	free text	important	Time duration between initial diagnosis and current intervention	23 months
1	diagnosis and intervention	Disease stage	disease_stage	string	free text	important	Stage of disease at current intervention	Stage II
1	diagnosis and intervention	Prior therapies for primary disease under study	prior_therapies	string	free text	important	List of all relevant previous therapies applied to subject for treatment of `Diagnosis`	melphalan/prednisone
1	diagnosis and intervention	Immunogen/agent	immunogen	string	free text	important	Antigen, vaccine or drug applied to subject at this intervention	bortezomib
1	diagnosis and intervention	Intervention definition	intervention	string	free text	important	Description of intervention	systemic chemotherapy, 6 cycles, 1.25 mg/m2
1	diagnosis and intervention	Other relevant medical history	medical_history	string	free text	important	Medical history of subject that is relevant to assess the course of disease and/or treatment	MGUS, first diagnosed 5 years prior
2	sample	Biological sample ID	sample_id	string	free text	important	"Sample ID assigned by submitter, unique within study. If possible, a persistent sample ID linked to INSDC or similar repository study should be used.
"	SUP52415
2	sample	Sample type	sample_type	string	free text	important	The way the sample was obtained, e.g. fine-needle aspirate, organ harvest, peripheral venous puncture	Biopsy
2	sample	Tissue	tissue	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: UBERON:0010000, label: multicellular anatomical structure}}	important	The actual tissue sampled, e.g. lymph node, liver, peripheral blood	id: UBERON:0002371, label: bone marrow
2	sample	Anatomic site	anatomic_site	string	free text	important	The anatomic location of the tissue, e.g. Inguinal, femur	Iliac crest
2	sample	Disease state of sample	disease_state_sample	string	free text	important	Histopathologic evaluation of the sample	Tumor infiltration
2	sample	Sample collection time	collection_time_point_relative	number	positive number	important	Time point at which sample was taken, relative to `Collection time event`	14
2	sample	Sample collection time unit	collection_time_point_relative_unit	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: UO:0000003, label: time unit}}	important	Unit of Sample collection time	id: UO:0000033, label: day
2	sample	Collection time event	collection_time_point_reference	string	free text	important	Event in the study schedule to which `Sample collection time` relates to	Primary vaccination
2	sample	Biomaterial provider	biomaterial_provider	string	free text	important	Name and address of the entity providing the sample	Tissues-R-Us, Tampa, FL, USA
3	process (cell)	Tissue processing	tissue_processing	string	free text	important	Enzymatic digestion and/or physical methods used to isolate cells from sample	Collagenase A/Dnase I digested, followed by Percoll gradient
3	process (cell)	Cell subset	cell_subset	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: CL:0000542, label: lymphocyte}}	important	Commonly-used designation of isolated cell population	id: CL:0000972, label: class switched memory B cell
3	process (cell)	Cell subset phenotype	cell_phenotype	string	free text	important	List of cellular markers and their expression levels used to isolate the cell population	CD19+ CD38+ CD27+ IgM- IgD-
3	process (cell)	Cell species	cell_species	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCBITAXON:7776, label: Gnathostomata}}	defined	"Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema.
"	id: NCBITAXON:9606, label: Homo sapiens
3	process (cell)	Single-cell sort	single_cell	boolean	true | false	important	TRUE if single cells were isolated into separate compartments	
3	process (cell)	Number of cells in experiment	cell_number	integer	positive integer	important	Total number of cells that went into the experiment	1000000
3	process (cell)	Number of cells per sequencing reaction	cells_per_reaction	integer	positive integer	important	Number of cells for each biological replicate	50000
3	process (cell)	Cell storage	cell_storage	boolean	true | false	important	TRUE if cells were cryo-preserved between isolation and further processing	True
3	process (cell)	Cell quality	cell_quality	string	free text	important	Relative amount of viable cells after preparation and (if applicable) thawing	90% viability as determined by 7-AAD
3	process (cell)	Cell isolation / enrichment procedure	cell_isolation	string	free text	important	Description of the procedure used for marker-based isolation or enrich cells	"Cells were stained with fluorochrome labeled antibodies and then sorted on a FlowMerlin (CE) cytometer.
"
3	process (cell)	Processing protocol	cell_processing_protocol	string	free text	important	"Description of the methods applied to the sample including cell preparation/ isolation/enrichment and nucleic acid extraction. This should closely mirror the Materials and methods section in the manuscript.
"	Stimulated wih anti-CD3/anti-CD28
3	process (nucleic acid)	Target substrate	template_class	string	free text	essential	"The class of nucleic acid that was used as primary starting material for the following procedures
"	RNA
3	process (nucleic acid)	Target substrate quality	template_quality	string	free text	important	Description and results of the quality control performed on the template material	RIN 9.2
3	process (nucleic acid)	Template amount	template_amount	number	positive number	important	Amount of template that went into the process	1000
3	process (nucleic acid)	Template amount time unit	template_amount_unit	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: UO:0000002, label: physical quantity}}	important	Unit of template amount	id: UO:0000024, label: nanogram
3	process (nucleic acid)	Library generation method	library_generation_method	string	free text	essential	Generic type of library generation	RT(oligo-dT)+TS(UMI)+PCR
3	process (nucleic acid)	Library generation protocol	library_generation_protocol	string	free text	important	Description of processes applied to substrate to obtain a library that is ready for sequencing	cDNA was generated using
3	process (nucleic acid)	Protocol IDs	library_generation_kit_version	string	free text	important	When using a library generation protocol from a commercial provider, provide the protocol version number	v2.1 (2016-09-15)
3	process (nucleic acid)	Complete sequences	complete_sequences	string	free text	essential	"To be considered `complete`, the procedure used for library construction MUST generate sequences that 1) include the first V gene codon that encodes the mature polypeptide chain (i.e. after the leader sequence) and 2) include the last complete codon of the J gene (i.e. 1 bp 5' of the J->C splice site) and 3) provide sequence information for all positions between 1) and 2). To be considered `complete & untemplated`, the sections of the sequences defined in points 1) to 3) of the previous sentence MUST be untemplated, i.e. MUST NOT overlap with the primers used in library preparation. `mixed` should only be used if the procedure used for library construction will likely produce multiple categories of sequences in the given experiment. It SHOULD NOT be used as a replacement of a NULL value.
"	partial
3	process (nucleic acid)	Physical linkage of different rearrangements	physical_linkage	string	free text	essential	"In case an experimental setup is used that physically links nucleic acids derived from distinct `Rearrangements` before library preparation, this field describes the mode of that linkage. All `hetero_*` terms indicate that in case of paired-read sequencing, the two reads should be expected to map to distinct IG/TR loci. `*_head-head` refers to techniques that link the 5' ends of transcripts in a single-cell context. `*_tail-head` refers to techniques that link the 3' end of one transcript to the 5' end of another one in a single-cell context. This term does not provide any information whether a continuous reading-frame between the two is generated. `*_prelinked` refers to constructs in which the linkage was already present on the DNA level (e.g. scFv).
"	hetero_head-head
3	process (nucleic acid [pcr])	Target locus for PCR	pcr_target_locus	string	free text	important	"Designation of the target locus. Note that this field uses a controlled vocubulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.
"	IGK
3	process (nucleic acid [pcr])	Forward PCR primer target location	forward_pcr_primer_target_location	string	free text	important	Position of the most distal nucleotide templated by the forward primer or primer mix	IGHV, +23
3	process (nucleic acid [pcr])	Reverse PCR primer target location	reverse_pcr_primer_target_location	string	free text	important	Position of the most proximal nucleotide templated by the reverse primer or primer mix	IGHG, +57
3	process (sequencing)	Batch number	sequencing_run_id	string	free text	important	ID of sequencing run assigned by the sequencing facility	160101_M01234
3	process (sequencing)	Total reads passing QC filter	total_reads_passing_qc_filter	integer	positive integer	important	Number of usable reads for analysis	10365118
3	process (sequencing)	Sequencing platform	sequencing_platform	string	free text	important	Designation of sequencing instrument used	Alumina LoSeq 1000
3	process (sequencing)	Sequencing facility	sequencing_facility	string	free text	important	Name and address of sequencing facility	Seqs-R-Us, Vancouver, BC, Canada
3	process (sequencing)	Date of sequencing run	sequencing_run_date	string	free text	important	Date of sequencing run	2016-12-16
3	process (sequencing)	Sequencing kit	sequencing_kit	string	free text	important	Name, manufacturer, order and lot numbers of sequencing kit	FullSeq 600, Alumina, #M123456C0, 789G1HK
4	data (raw reads)	Raw sequencing data persistent identifier	sequencing_data_id	string	free text	important	"Persistent identifier of raw data stored in an archive (e.g. INSDC run ID). Data archive should  be identified in the CURIE prefix.
"	SRA:SRR11610494
4	data (raw reads)	Raw sequencing data file type	file_type	string	free text	important	File format for the raw reads or sequences	
4	data (raw reads)	Raw sequencing data file name	filename	string	free text	important	File name for the raw reads or sequences. The first file in paired-read sequencing.	MS10R-NMonson-C7JR9_S1_R1_001.fastq
4	data (raw reads)	Read direction	read_direction	string	free text	important	Read direction for the raw reads or sequences. The first file in paired-read sequencing.	forward
4	data (raw reads)	Forward read length	read_length	integer	positive integer	important	Read length in bases for the first file in paired-read sequencing	300
4	data (raw reads)	Paired raw sequencing data file name	paired_filename	string	free text	important	File name for the second file in paired-read sequencing	MS10R-NMonson-C7JR9_S1_R2_001.fastq
4	data (raw reads)	Paired read direction	paired_read_direction	string	free text	important	Read direction for the second file in paired-read sequencing	reverse
4	data (raw reads)	Paired read length	paired_read_length	integer	positive integer	important	Read length in bases for the second file in paired-read sequencing	300
5	process (computational)	Software tools and version numbers	software_versions	string	free text	important	Version number and / or date, include company pipelines	IgBLAST 1.6
5	process (computational)	Paired read assembly	paired_reads_assembly	string	free text	important	How paired end reads were assembled into a single receptor sequence	PandaSeq (minimal overlap 50, threshold 0.8)
5	process (computational)	Quality thresholds	quality_thresholds	string	free text	important	How/if sequences were removed from (4) based on base quality scores	Average Phred score >=20
5	process (computational)	Primer match cutoffs	primer_match_cutoffs	string	free text	important	How primers were identified in the sequences, were they removed/masked/etc?	Hamming distance <= 2
5	process (computational)	Collapsing method	collapsing_method	string	free text	important	The method used for combining multiple sequences from (4) into a single sequence in (5)	MUSCLE 3.8.31
5	process (computational)	Data processing protocols	data_processing_protocols	string	free text	important	General description of how QC is performed	Data was processed using [...]
5	data (processed sequence)	V(D)J germline reference database	germline_database	string	free text	important	Source of germline V(D)J genes with version number or date accessed.	ENSEMBL, Homo sapiens build 90, 2017-10-01
