Alignment Schema (Experimental)

Alignment Schema (Experimental)#

An Alignment is the output from a V(D)J assignment process for a single V, D, J, or C gene for a sequence. It is not necessary that the V(D)J assignment process performs a sequence alignment algorithm, as the schema can support any algorithmic process. Multiple Alignment records are supported and expected for a single sequence with context-dependent fields (score, identity, support, rank) for assessing the quality of assignments that can vary considerably in definition based on the methodology used.

Note, this schema definition is still experimental and should not be considered final.

File Format Specification#

The format specification describes the file format and details on how to structure this data.

Fields#

Download as TSV

Name

Type

Attributes

Definition

sequence_id

string

required, identifier, nullable

Unique query sequence identifier within the file. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment.

segment

string

required, nullable

The segment for this alignment. One of V, D, J or C.

rev_comp

boolean

optional, nullable

Alignment result is from the reverse complement of the query sequence.

call

string

required, nullable

Gene assignment with allele.

score

number

required, nullable

Alignment score.

identity

number

optional, nullable

Alignment fractional identity.

support

number

optional, nullable

Alignment E-value, p-value, likelihood, probability or other similar measure of support for the gene assignment as defined by the alignment tool.

cigar

string

required, nullable

Alignment CIGAR string.

sequence_start

integer

optional, nullable

Start position of the segment in the query sequence (1-based closed interval).

sequence_end

integer

optional, nullable

End position of the segment in the query sequence (1-based closed interval).

germline_start

integer

optional, nullable

Alignment start position in the reference sequence (1-based closed interval).

germline_end

integer

optional, nullable

Alignment end position in the reference sequence (1-based closed interval).

rank

integer

optional, nullable

Alignment rank.

rearrangement_id

string

DEPRECATED

Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications.

data_processing_id

string

optional, nullable

Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed.

germline_database

string

DEPRECATED

Source of germline V(D)J genes with version number or date accessed.