Set	Subset	Designation	Field	Type	Format	Level	Definition	Example
1	study	Study ID	study_id	string	free text	important	"Unique ID assigned by study registry such as one of the International Nucleotide Sequence Database Collaboration (INSDC) repositories.
"	PRJNA001
1	study	Study title	study_title	string	free text	important	Descriptive study title	Effects of sun light exposure of the Treg repertoire
1	study	Study type	study_type	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCIT:C63536, label: Study}}	important	Type of study design	id: NCIT:C15197, label: Case-Control Study
1	study	Study inclusion/exclusion criteria	inclusion_exclusion_criteria	string	free text	important	List of criteria for inclusion/exclusion for the study	Include: Clinical P. falciparum infection; Exclude: Seropositive for HIV
1	study	Grant funding agency	grants	string	free text	important	Funding agencies and grant numbers	NIH, award number R01GM987654
1	study	Contributors	contributors	array of :ref:`Contributor <ContributorFields>`		essential	"List of individuals who contributed to the study. Note that these are not necessarily identical with the authors on an associated manuscript or other scholarly communication. Further note that typically at least the three CRediT contributor roles ""supervision"", ""investigation"" and ""data curation"" should be assigned. The coresponding author should be listed last.
"	
1	study	Relevant publications	pub_ids	array of string		important	"Array of publications describing the rationale and/or outcome of the study as an array of CURIE objects such as  a DOI or Pubmed ID. Where more than one publication is given, if there is a primary publication for the study it should come first.
"	['PMID:29144493', 'DOI:10.1038/ni.3873']
1	study	Keywords for study	keywords_study	array of string	Controlled vocabulary: contains_ig, contains_tr, contains_paired_chain, contains_schema_rearrangement, contains_schema_clone, contains_schema_cell, contains_schema_receptor, contains_schema_cellexpression, contains_schema_receptorreactivity	important	"Keywords describing properties of one or more data sets in a study. ""contains_schema"" keywords indicate that the study contains data objects from the AIRR Schema of that type (Rearrangement, Clone, Cell, Receptor) while the other keywords indicate that the study design considers the type of data indicated (e.g. it is possible to have a study that ""contains_paired_chain"" but does not ""contains_schema_cell"").
"	['contains_ig', 'contains_schema_rearrangement', 'contains_schema_clone', 'contains_schema_cell']
1	subject	Subject ID	subject_id	string	free text	important	"Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used.
"	SUB856413
1	subject	Synthetic library	synthetic	boolean	true | false	essential	TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display)	
1	subject	Organism	species	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCBITAXON:7776, label: Gnathostomata}}	essential	Binomial designation of subject's species	id: NCBITAXON:9606, label: Homo sapiens
1	subject	Sex	sex	string	Controlled vocabulary: male, female, pooled, hermaphrodite, intersex	important	Biological sex of subject	female
1	subject		age	:ref:`TimeInterval <TimeIntervalFields>`		important	"Age of subject expressed as a time interval. If singular time point then min == max in the time interval.
"	
1	subject	Age event	age_event	string	free text	important	"Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity.
"	enrollment
1	subject	Ancestry population	ancestry_population	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: GAZ:00000448, label: geographic location}}	important	Broad geographic origin of ancestry (continent)	id: GAZ:00000459, label: South America
1	subject		location_birth	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: GAZ:00000448, label: geographic location}}	important	Self-reported location of birth of the subject, preferred granularity is country-level	id: GAZ:00002939, label: Poland
1	subject	Ethnicity	ethnicity	string	free text	important	Ethnic group of subject (defined as cultural/language-based membership)	English, Kurds, Manchu, Yakuts (and other fields from Wikipedia)
1	subject	Race	race	string	free text	important	Racial group of subject (as defined by NIH)	White, American Indian or Alaska Native, Black, Asian, Native Hawaiian or Other Pacific Islander, Other
1	subject	Strain name	strain_name	string	free text	important	Non-human designation of the strain or breed of animal used	C57BL/6J
1	subject	Relation to other subjects	linked_subjects	string	free text	important	Subject ID to which `Relation type` refers	SUB1355648
1	subject	Relation type	link_type	string	free text	important	Relation between subject and `linked_subjects`, can be genetic or environmental (e.g.exposure)	father, daughter, household
1	diagnosis and intervention	Study group description	study_group_description	string	free text	important	Designation of study arm to which the subject is assigned to	control
1	diagnosis and intervention	Diagnosis timepoint	diagnosis_timepoint	:ref:`TimePoint <TimePointFields>`		important	Time point for the diagnosis	OrderedDict([('label', 'Study enrollment'), ('value', 60), ('unit', OrderedDict([('id', 'UO:0000033'), ('label', 'day')]))])
1	diagnosis and intervention	Diagnosis	disease_diagnosis	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: DOID:4, label: disease}}	important	Diagnosis of subject	id: DOID:9538, label: multiple myeloma
1	diagnosis and intervention	Length of disease	disease_length	:ref:`TimeQuantity <TimeQuantityFields>`		important	Time duration between initial diagnosis and current intervention	OrderedDict([('quantity', 23), ('unit', OrderedDict([('id', 'UO:0000035'), ('label', 'month')]))])
1	diagnosis and intervention	Disease stage	disease_stage	string	free text	important	Stage of disease at current intervention	Stage II
1	diagnosis and intervention	Prior therapies for primary disease under study	prior_therapies	string	free text	important	List of all relevant previous therapies applied to subject for treatment of `Diagnosis`	melphalan/prednisone
1	diagnosis and intervention	Immunogen/agent	immunogen	string	free text	important	Antigen, vaccine or drug applied to subject at this intervention	bortezomib
1	diagnosis and intervention	Intervention definition	intervention	string	free text	important	Description of intervention	systemic chemotherapy, 6 cycles, 1.25 mg/m2
1	diagnosis and intervention	Other relevant medical history	medical_history	string	free text	important	Medical history of subject that is relevant to assess the course of disease and/or treatment	MGUS, first diagnosed 5 years prior
2	sample	Biological sample ID	sample_id	string	free text	important	"Sample ID assigned by submitter, unique within study. If possible, a persistent sample ID linked to INSDC or similar repository study should be used.
"	SUP52415
2	sample	Sample type	sample_type	string	free text	important	The way the sample was obtained, e.g. fine-needle aspirate, organ harvest, peripheral venous puncture	Biopsy
2	sample	Tissue	tissue	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: UBERON:0010000, label: multicellular anatomical structure}}	important	The actual tissue sampled, e.g. lymph node, liver, peripheral blood	id: UBERON:0002371, label: bone marrow
2	sample	Anatomic site	anatomic_site	string	free text	important	The anatomic location of the tissue, e.g. Inguinal, femur	Iliac crest
2	sample	Disease state of sample	disease_state_sample	string	free text	important	Histopathologic evaluation of the sample	Tumor infiltration
2	sample	Sample collection time	collection_time_point_relative	:ref:`TimePoint <TimePointFields>`		important	Time point at which sample was taken, relative to `label` event	OrderedDict([('label', 'Primary vaccination'), ('value', 14), ('unit', OrderedDict([('id', 'UO:0000033'), ('label', 'day')]))])
2	sample	Sample collection location	collection_location	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: GAZ:00000448, label: geographic location}}	important	Location where the sample was taken, preferred granularity is country-level	id: GAZ:00002939, label: Poland
2	sample	Biomaterial provider	biomaterial_provider	string	free text	important	Name and address of the entity providing the sample	Tissues-R-Us, Tampa, FL, USA
3	process (cell)	Tissue processing	tissue_processing	string	free text	important	Enzymatic digestion and/or physical methods used to isolate cells from sample	Collagenase A/Dnase I digested, followed by Percoll gradient
3	process (cell)	Cell subset	cell_subset	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: CL:0000542, label: lymphocyte}}	important	Commonly-used designation of isolated cell population	id: CL:0000972, label: class switched memory B cell
3	process (cell)	Cell subset phenotype	cell_phenotype	string	free text	important	"List of cellular markers and their expression levels used to isolate the cell population.
"	CD19+ CD38+ CD27+ IgM- IgD-
3	process (cell)	Cell annotation	cell_label	string	free text	defined	"Free text cell type annotation. Primarily used for annotating cell types that are not  provided in the Cell Ontology.
"	age-associated B cell
3	process (cell)	Cell species	cell_species	:ref:`Ontology <OntoVoc>`	Ontology: { top_node: { id: NCBITAXON:7776, label: Gnathostomata}}	defined	"Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema.
"	id: NCBITAXON:9606, label: Homo sapiens
3	process (cell)	Single-cell sort	single_cell	boolean	true | false	important	TRUE if single cells were isolated into separate compartments	
3	process (cell)	Number of cells in experiment	cell_number	integer	positive integer	important	Total number of cells that went into the experiment	1000000
3	process (cell)	Number of cells per sequencing reaction	cells_per_reaction	integer	positive integer	important	Number of cells for each biological replicate	50000
3	process (cell)	Cell storage	cell_storage	boolean	true | false	important	TRUE if cells were cryo-preserved between isolation and further processing	True
3	process (cell)	Cell quality	cell_quality	string	free text	important	Relative amount of viable cells after preparation and (if applicable) thawing	90% viability as determined by 7-AAD
3	process (cell)	Cell isolation / enrichment procedure	cell_isolation	string	free text	important	Description of the procedure used for marker-based isolation or enrich cells	"Cells were stained with fluorochrome labeled antibodies and then sorted on a FlowMerlin (CE) cytometer.
"
3	process (cell)	Processing protocol	cell_processing_protocol	string	free text	important	"Description of the methods applied to the sample including cell preparation/ isolation/enrichment and nucleic acid extraction. This should closely mirror the Materials and methods section in the manuscript.
"	Stimulated wih anti-CD3/anti-CD28
3	process (nucleic acid)	Target substrate	template_class	string	Controlled vocabulary: DNA, RNA	essential	"The class of nucleic acid that was used as primary starting material for the following procedures
"	RNA
3	process (nucleic acid)	Target substrate quality	template_quality	string	free text	important	Description and results of the quality control performed on the template material	RIN 9.2
3	process (nucleic acid)	Template amount	template_amount	:ref:`PhysicalQuantity <PhysicalQuantityFields>`		important	Amount of template that went into the process	OrderedDict([('quantity', 1000), ('unit', OrderedDict([('id', 'UO:0000024'), ('label', 'nanogram')]))])
3	process (nucleic acid)	Library generation method	library_generation_method	string	Controlled vocabulary: PCR, RT(RHP)+PCR, RT(oligo-dT)+PCR, RT(oligo-dT)+TS+PCR, RT(oligo-dT)+TS(UMI)+PCR, RT(specific)+PCR, RT(specific)+TS+PCR, RT(specific)+TS(UMI)+PCR, RT(specific+UMI)+PCR, RT(specific+UMI)+TS+PCR, RT(specific)+TS, other	essential	Generic type of library generation	RT(oligo-dT)+TS(UMI)+PCR
3	process (nucleic acid)	Library generation protocol	library_generation_protocol	string	free text	important	Description of processes applied to substrate to obtain a library that is ready for sequencing	cDNA was generated using
3	process (nucleic acid)	Protocol IDs	library_generation_kit_version	string	free text	important	When using a library generation protocol from a commercial provider, provide the protocol version number	v2.1 (2016-09-15)
3	process (nucleic acid)	Complete sequences	complete_sequences	string	Controlled vocabulary: partial, complete, complete+untemplated, mixed	essential	"To be considered `complete`, the procedure used for library construction MUST generate sequences that 1) include the first V gene codon that encodes the mature polypeptide chain (i.e. after the leader sequence) and 2) include the last complete codon of the J gene (i.e. 1 bp 5' of the J->C splice site) and 3) provide sequence information for all positions between 1) and 2). To be considered `complete & untemplated`, the sections of the sequences defined in points 1) to 3) of the previous sentence MUST be untemplated, i.e. MUST NOT overlap with the primers used in library preparation. `mixed` should only be used if the procedure used for library construction will likely produce multiple categories of sequences in the given experiment. It SHOULD NOT be used as a replacement of a NULL value.
"	partial
3	process (nucleic acid)	Physical linkage of different rearrangements	physical_linkage	string	Controlled vocabulary: none, hetero_head-head, hetero_tail-head, hetero_prelinked	essential	"In case an experimental setup is used that physically links nucleic acids derived from distinct `Rearrangements` before library preparation, this field describes the mode of that linkage. All `hetero_*` terms indicate that in case of paired-read sequencing, the two reads should be expected to map to distinct IG/TR loci. `*_head-head` refers to techniques that link the 5' ends of transcripts in a single-cell context. `*_tail-head` refers to techniques that link the 3' end of one transcript to the 5' end of another one in a single-cell context. This term does not provide any information whether a continuous reading-frame between the two is generated. `*_prelinked` refers to constructs in which the linkage was already present on the DNA level (e.g. scFv).
"	hetero_head-head
3	process (nucleic acid [pcr])	Target locus for PCR	pcr_target_locus	string	Controlled vocabulary: IGH, IGI, IGK, IGL, TRA, TRB, TRD, TRG	important	"Designation of the target locus. Note that this field uses a controlled vocubulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.
"	IGK
3	process (nucleic acid [pcr])	Forward PCR primer target location	forward_pcr_primer_target_location	string	free text	important	Position of the most distal nucleotide templated by the forward primer or primer mix	IGHV, +23
3	process (nucleic acid [pcr])	Reverse PCR primer target location	reverse_pcr_primer_target_location	string	free text	important	Position of the most proximal nucleotide templated by the reverse primer or primer mix	IGHG, +57
3	process (sequencing)	Batch number	sequencing_run_id	string	free text	important	ID of sequencing run assigned by the sequencing facility	160101_M01234
3	process (sequencing)	Total reads passing QC filter	total_reads_passing_qc_filter	integer	positive integer	important	Number of usable reads for analysis	10365118
3	process (sequencing)	Sequencing platform	sequencing_platform	string	free text	important	Designation of sequencing instrument used	Alumina LoSeq 1000
3	process (sequencing)	Sequencing facility	sequencing_facility	string	free text	important	Name and address of sequencing facility	Seqs-R-Us, Vancouver, BC, Canada
3	process (sequencing)	Date of sequencing run	sequencing_run_date	string	free text	important	Date of sequencing run	2016-12-16
3	process (sequencing)	Sequencing kit	sequencing_kit	string	free text	important	Name, manufacturer, order and lot numbers of sequencing kit	FullSeq 600, Alumina, #M123456C0, 789G1HK
4	data (raw reads)	Raw sequencing data persistent identifier	sequencing_data_id	string	free text	important	"Persistent identifier of raw data stored in an archive (e.g. INSDC run ID). Data archive should  be identified in the CURIE prefix.
"	SRA:SRR11610494
4	data (raw reads)	Raw sequencing data file type	file_type	string	Controlled vocabulary: fasta, fastq	important	File format for the raw reads or sequences	
4	data (raw reads)	Raw sequencing data file name	filename	string	free text	important	File name for the raw reads or sequences. The first file in paired-read sequencing.	MS10R-NMonson-C7JR9_S1_R1_001.fastq
4	data (raw reads)	Read direction	read_direction	string	Controlled vocabulary: forward, reverse, mixed	important	Read direction for the raw reads or sequences. The first file in paired-read sequencing.	forward
4	data (raw reads)	Forward read length	read_length	integer	positive integer	important	Read length in bases for the first file in paired-read sequencing	300
4	data (raw reads)	Paired raw sequencing data file name	paired_filename	string	free text	important	File name for the second file in paired-read sequencing	MS10R-NMonson-C7JR9_S1_R2_001.fastq
4	data (raw reads)	Paired read direction	paired_read_direction	string	Controlled vocabulary: forward, reverse, mixed	important	Read direction for the second file in paired-read sequencing	reverse
4	data (raw reads)	Paired read length	paired_read_length	integer	positive integer	important	Read length in bases for the second file in paired-read sequencing	300
5	process (computational)	Software tools and version numbers	software_versions	string	free text	important	Version number and / or date, include company pipelines	IgBLAST 1.6
5	process (computational)	Paired read assembly	paired_reads_assembly	string	free text	important	How paired end reads were assembled into a single receptor sequence	PandaSeq (minimal overlap 50, threshold 0.8)
5	process (computational)	Quality thresholds	quality_thresholds	string	free text	important	How/if sequences were removed from (4) based on base quality scores	Average Phred score >=20
5	process (computational)	Primer match cutoffs	primer_match_cutoffs	string	free text	important	How primers were identified in the sequences, were they removed/masked/etc?	Hamming distance <= 2
5	process (computational)	Collapsing method	collapsing_method	string	free text	important	The method used for combining multiple sequences from (4) into a single sequence in (5)	MUSCLE 3.8.31
5	process (computational)	Data processing protocols	data_processing_protocols	string	free text	important	General description of how QC is performed	Data was processed using [...]
5	process (computational)	V(D)J germline reference database	germline_database	string	free text	important	Source of germline V(D)J genes with version number or date accessed.	ENSEMBL, Homo sapiens build 90, 2017-10-01
6	data (processed sequence)	V gene with allele	v_call	string	free text	important	"V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB).
"	IGHV4-59*01
6	data (processed sequence)	D gene with allele	d_call	string	free text	important	"First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).
"	IGHD3-10*01
6	data (processed sequence)	J gene with allele	j_call	string	free text	important	"J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB).
"	IGHJ4*02
6	data (processed sequence)	C region	c_call	string	free text	important	"Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB).
"	IGHG1*01
6	data (processed sequence)	IMGT-JUNCTION nucleotide sequence	junction	string	free text	important	"Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.
"	TGTGCAAGAGCGGGAGTTTACGACGGATATACTATGGACTACTGG
6	data (processed sequence)	IMGT-JUNCTION amino acid sequence	junction_aa	string	free text	important	"Amino acid translation of the junction.
"	CARAGVYDGYTMDYW
6	data (processed sequence)	Read count	duplicate_count	integer	positive integer	important	"Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence.
"	123
6	data (processed sequence)	Cell index	cell_id	string	free text	important	"Identifier defining the cell of origin for the query sequence.
"	W06_046_091
