API Reference#

Rearrangement Interface#

airr.read_rearrangement(filename, validate=False, debug=False)#

Open an iterator to read an AIRR rearrangements file

Parameters
  • file (str) – path to the input file.

  • validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

iterable reader class.

Return type

airr.io.RearrangementReader

airr.create_rearrangement(filename, fields=None, debug=False)#

Create an empty AIRR rearrangements file writer

Parameters
  • filename (str) – output file path.

  • fields (list) – additional non-required fields to add to the output.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

open writer class.

Return type

airr.io.RearrangementWriter

airr.derive_rearrangement(out_filename, in_filename, fields=None, debug=False)#

Create an empty AIRR rearrangements file with fields derived from an existing file

Parameters
  • out_filename (str) – output file path.

  • in_filename (str) – existing file to derive fields from.

  • fields (list) – additional non-required fields to add to the output.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

open writer class.

Return type

airr.io.RearrangementWriter

airr.load_rearrangement(filename, validate=False, debug=False)#

Load the contents of an AIRR rearrangements file into a data frame

Parameters
  • filename (str) – input file path.

  • validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

Rearrangement records as rows of a data frame.

Return type

pandas.DataFrame

airr.dump_rearrangement(dataframe, filename, debug=False)#

Write the contents of a data frame to an AIRR rearrangements file

Parameters
  • dataframe (pandas.DataFrame) – data frame of rearrangement data.

  • filename (str) – output file path.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if the file is written without error.

Return type

bool

airr.merge_rearrangement(out_filename, in_filenames, drop=False, debug=False)#

Merge one or more AIRR rearrangements files

Parameters
  • out_filename (str) – output file path.

  • in_filenames (list) – list of input files to merge.

  • drop (bool) – drop flag. If True then drop fields that do not exist in all input files, otherwise combine fields from all input files.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if files were successfully merged, otherwise False.

Return type

bool

airr.validate_rearrangement(filename, debug=False)#

Validates an AIRR rearrangements file

Parameters
  • filename (str) – path of the file to validate.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if files passed validation, otherwise False.

Return type

bool

AIRR Data Model Interface#

airr.read_airr(filename, format=None, validate=False, model=True, debug=False)#

Load an AIRR Data file

Parameters
  • filename (str) – path to the input file.

  • format (str) – input file format valid strings are “yaml” or “json”. If set to None, the file format will be automatically detected from the file extension.

  • validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of a validation failure.

  • model (bool) – If True only validate objects defined in the AIRR DataFile schema. If False, attempt validation of all top-level objects. Ignored if validate=False.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

dictionary of AIRR Data objects.

Return type

dict

airr.write_airr(filename, data, format=None, info=None, validate=False, model=True, debug=False)#

Write an AIRR Data file

Parameters
  • filename (str) – path to the output file.

  • data (dict) – dictionary of AIRR Data Model objects.

  • format (str) – output file format valid strings are “yaml” or “json”. If set to None, the file format will be automatically detected from the file extension.

  • info (object) – info object to write. Will write current AIRR Schema info if not specified.

  • validate (bool) – whether to validate data before it is written, raising a ValidationError exception in the event of a validation failure.

  • model (bool) – If True only validate and write objects defined in the AIRR DataFile schema. If False, attempt validation and write of all top-level objects

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if the file is written without error.

Return type

bool

airr.validate_airr(data, model=True, debug=False)#

Validates an AIRR Data file

Parameters
  • data (dict) – dictionary containing AIRR Data Model objects

  • model (bool) – If True only validate objects defined in the AIRR DataFile schema. If False, attempt validation of all top-level objects

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if files passed validation, otherwise False.

Return type

bool

Classes#

class airr.io.RearrangementReader(handle, base=1, validate=False, debug=False)#

Iterator for reading Rearrangement objects in TSV format

fields#

field names in the input Rearrangement file.

Type

list

external_fields#

list of fields in the input file that are not part of the Rearrangement definition.

Type

list

__init__(handle, base=1, validate=False, debug=False)#

Initialization

Parameters
  • handle (file) – file handle of the open Rearrangement file.

  • base (int) – one of 0 or 1 specifying the coordinate schema in the input file. If 1, then the file is assumed to contain 1-based closed intervals that will be converted to python style 0-based half-open intervals for known fields. If 0, then values will be unchanged.

  • validate (bool) – perform validation. If True then basic validation will be performed will reading the data. A ValidationError exception will be raised if an error is found.

  • debug (bool) – debug state. If True prints debug information.

Returns

reader object.

Return type

airr.io.RearrangementReader

__iter__()#

Iterator initializer

Returns

airr.io.RearrangementReader

__next__()#

Next method

Returns

parsed Rearrangement data.

Return type

dict

close()#

Closes the Rearrangement file

next()#

Next method

class airr.io.RearrangementWriter(handle, fields=None, base=1, debug=False)#

Writer class for Rearrangement objects in TSV format

fields#

field names in the output Rearrangement file.

Type

list

external_fields#

list of fields in the output file that are not part of the Rearrangement definition.

Type

list

__init__(handle, fields=None, base=1, debug=False)#

Initialization

Parameters
  • handle (file) – file handle of the open Rearrangements file.

  • fields (list) – list of non-required fields to add. May include fields undefined by the schema.

  • base (int) – one of 0 or 1 specifying the coordinate schema in the output file. Data provided to the write is assumed to be in python style 0-based half-open intervals. If 1, then data will be converted to 1-based closed intervals for known fields before writing. If 0, then values will be unchanged.

  • debug (bool) – debug state. If True prints debug information.

Returns

writer object.

Return type

airr.io.RearrangementWriter

close()#

Closes the Rearrangement file

write(row)#

Write a row to the Rearrangement file

Parameters

row (dict) – row to write.

class airr.schema.Schema(definition)#

AIRR schema definitions

definition#

name of the schema definition.

info#

schema info.

Type

collections.OrderedDict

properties#

field definitions.

Type

collections.OrderedDict

required#

list of mandatory fields.

Type

list

optional#

list of non-required fields.

Type

list

false_values#

accepted string values for False.

Type

list

true_values#

accepted values for True.

Type

list

from_bool(value, validate=False)#

Converts a boolean to a string

Parameters
  • value (bool) – logical value.

  • validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.

Returns

conversion of True or False or ‘T’ or ‘F’.

Return type

str

Raises

airr.ValidationError – raised if value is invalid when validate is set True.

pandas_types()#

Map of schema types to pandas types

Returns

mapping dictionary for pandas types

Return type

dict

spec(field)#

Get the properties for a field

Parameters

name (str) – field name.

Returns

definition for the field.

Return type

collections.OrderedDict

template()#

Create an empty template object

Returns

dictionary with all schema properties set as None or an empty list.

Return type

collections.OrderedDict

to_bool(value, validate=False)#

Convert a string to a boolean

Parameters
  • value (str) – logical value as a string.

  • validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.

Returns

conversion of the string to True or False.

Return type

bool

Raises

airr.ValidationError – raised if value is invalid when validate is set True.

to_float(value, validate=False)#

Converts a string to a float

Parameters
  • value (str) – float value as a string.

  • validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.

Returns

conversion of the string to a float.

Return type

float

Raises

airr.ValidationError – raised if value is invalid when validate is set True.

to_int(value, validate=False)#

Converts a string to an integer

Parameters
  • value (str) – integer value as a string.

  • validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.

Returns

conversion of the string to an integer.

Return type

int

Raises

airr.ValidationError – raised if value is invalid when validate is set True.

type(field)#

Get the type for a field

Parameters

name (str) – field name.

Returns

the type definition for the field

Return type

str

validate_header(header)#

Validate header against the schema

Parameters

header (list) – list of header fields.

Returns

True if a ValidationError exception is not raised.

Return type

bool

Raises

airr.ValidationError – raised if header fails validation.

validate_object(obj, missing=True, nonairr=True, context=None)#

Validate Repertoire object data against schema

Parameters
  • obj (dict) – dictionary containing a single repertoire object.

  • missing (bool) – provides warnings for missing optional fields.

  • (bool (nonairr) – provides warning for non-AIRR fields that cannot be validated.

  • context (string) – used by recursion to indicate place in object hierarchy

Returns

True if a ValidationError exception is not raised.

Return type

bool

Raises

airr.ValidationError – raised if object fails validation.

validate_row(row)#

Validate Rearrangements row data against schema

Parameters

row (dict) – dictionary containing a single record.

Returns

True if a ValidationError exception is not raised.

Return type

bool

Raises

airr.ValidationError – raised if row fails validation.

Schema#

airr.schema.InfoSchema Schema object for the Info definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.DataFileSchema Schema object for the DataFile definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.AlignmentSchema Schema object for the Alignment definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.RearrangementSchema Schema object for the Rearrangement definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.RepertoireSchema Schema object for the Repertoire definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.GermlineSetSchema Schema object for the Repertoire definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

airr.schema.GenotypeSetSchema Schema object for the Repertoire definition#

AIRR schema definitions

airr.schema.definition#

name of the schema definition.

airr.schema.info#

schema info.

Type

collections.OrderedDict

airr.schema.properties#

field definitions.

Type

collections.OrderedDict

airr.schema.required#

list of mandatory fields.

Type

list

airr.schema.optional#

list of non-required fields.

Type

list

airr.schema.false_values#

accepted string values for False.

Type

list

airr.schema.true_values#

accepted values for True.

Type

list

Deprecated#

airr.load_repertoire(filename, validate=False, debug=False)#

Load an AIRR repertoire metadata file

Parameters
  • filename (str) – path to the input file.

  • validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

dictionary of AIRR Data objects.

Return type

dict

Deprecated since version 1.4: Use read_airr() instead.

airr.write_repertoire(filename, repertoires, info=None, debug=False)#

Write an AIRR repertoire metadata file

Parameters
  • file (str) – path to the output file.

  • repertoires (list) – array of repertoire objects.

  • info (object) – info object to write. Will write current AIRR Schema info if not specified.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if the file is written without error.

Return type

bool

Deprecated since version 1.4: Use write_airr() instead.

airr.validate_repertoire(filename, debug=False)#

Validates an AIRR repertoire metadata file

Parameters
  • filename (str) – path of the file to validate.

  • debug (bool) – debug flag. If True print debugging information to standard error.

Returns

True if files passed validation, otherwise False.

Return type

bool

Deprecated since version 1.4: Use validate_airr() instead.

airr.repertoire_template()#

Return a blank repertoire object from the template. This object has the complete structure with all of the fields and all values set to None or empty string.

Returns

empty repertoire object.

Return type

object

Deprecated since version 1.4: Use schema.Schema.template() instead.