API Reference#
Rearrangement Interface#
- airr.read_rearrangement(filename, validate=False, debug=False)#
Open an iterator to read an AIRR rearrangements file
- Parameters:
file (str) – path to the input file.
validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
iterable reader class.
- Return type:
- airr.create_rearrangement(filename, fields=None, debug=False)#
Create an empty AIRR rearrangements file writer
- Parameters:
filename (str) – output file path.
fields (list) – additional non-required fields to add to the output.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
open writer class.
- Return type:
- airr.derive_rearrangement(out_filename, in_filename, fields=None, debug=False)#
Create an empty AIRR rearrangements file with fields derived from an existing file
- Parameters:
out_filename (str) – output file path.
in_filename (str) – existing file to derive fields from.
fields (list) – additional non-required fields to add to the output.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
open writer class.
- Return type:
- airr.load_rearrangement(filename, validate=False, debug=False)#
Load the contents of an AIRR rearrangements file into a data frame
- Parameters:
filename (str) – input file path.
validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
Rearrangement records as rows of a data frame.
- Return type:
pandas.DataFrame
- airr.dump_rearrangement(dataframe, filename, debug=False)#
Write the contents of a data frame to an AIRR rearrangements file
- Parameters:
dataframe (pandas.DataFrame) – data frame of rearrangement data.
filename (str) – output file path.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if the file is written without error.
- Return type:
bool
- airr.merge_rearrangement(out_filename, in_filenames, drop=False, debug=False)#
Merge one or more AIRR rearrangements files
- Parameters:
out_filename (str) – output file path.
in_filenames (list) – list of input files to merge.
drop (bool) – drop flag. If True then drop fields that do not exist in all input files, otherwise combine fields from all input files.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if files were successfully merged, otherwise False.
- Return type:
bool
- airr.validate_rearrangement(filename, debug=False)#
Validates an AIRR rearrangements file
- Parameters:
filename (str) – path of the file to validate.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if files passed validation, otherwise False.
- Return type:
bool
AIRR Data Model Interface#
- airr.read_airr(filename, format=None, validate=False, model=True, debug=False, check_nullable=True)#
Load an AIRR Data file
- Parameters:
filename (str) – path to the input file.
format (str) – input file format valid strings are “yaml” or “json”. If set to None, the file format will be automatically detected from the file extension.
validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of a validation failure.
model (bool) – If True only validate objects defined in the AIRR DataFile schema. If False, attempt validation of all top-level objects. Ignored if validate=False.
debug (bool) – debug flag. If True print debugging information to standard error.
check_nullable (bool) – whether to check for nullable fields if validating the data.
- Returns:
dictionary of AIRR Data objects.
- Return type:
dict
- airr.write_airr(filename, data, format=None, info=None, validate=False, model=True, debug=False, check_nullable=True)#
Write an AIRR Data file
- Parameters:
filename (str) – path to the output file.
data (dict) – dictionary of AIRR Data Model objects.
format (str) – output file format valid strings are “yaml” or “json”. If set to None, the file format will be automatically detected from the file extension.
info (object) – info object to write. Will write current AIRR Schema info if not specified.
validate (bool) – whether to validate data before it is written, raising a ValidationError exception in the event of a validation failure.
model (bool) – If True only validate and write objects defined in the AIRR DataFile schema. If False, attempt validation and write of all top-level objects
debug (bool) – debug flag. If True print debugging information to standard error.
check_nullable (bool) – whether to check for nullable fields if validating the data.
- Returns:
True if the file is written without error.
- Return type:
bool
- airr.validate_airr(data, model=True, debug=False, check_nullable=True)#
Validates an AIRR Data file
- Parameters:
data (dict) – dictionary containing AIRR Data Model objects
model (bool) – If True only validate objects defined in the AIRR DataFile schema. If False, attempt validation of all top-level objects
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if files passed validation, otherwise False.
- Return type:
bool
Classes#
- class airr.io.RearrangementReader(handle, base=1, validate=False, debug=False)#
Iterator for reading Rearrangement objects in TSV format
- fields#
field names in the input Rearrangement file.
- Type:
list
- external_fields#
list of fields in the input file that are not part of the Rearrangement definition.
- Type:
list
- __init__(handle, base=1, validate=False, debug=False)#
Initialization
- Parameters:
handle (file) – file handle of the open Rearrangement file.
base (int) – one of 0 or 1 specifying the coordinate schema in the input file. If 1, then the file is assumed to contain 1-based closed intervals that will be converted to python style 0-based half-open intervals for known fields. If 0, then values will be unchanged.
validate (bool) – perform validation. If True then basic validation will be performed will reading the data. A ValidationError exception will be raised if an error is found.
debug (bool) – debug state. If True prints debug information.
- Returns:
reader object.
- Return type:
- __iter__()#
Iterator initializer
- Returns:
airr.io.RearrangementReader
- __next__()#
Next method
- Returns:
parsed Rearrangement data.
- Return type:
dict
- close()#
Closes the Rearrangement file
- next()#
Next method
- class airr.io.RearrangementWriter(handle, fields=None, base=1, debug=False)#
Writer class for Rearrangement objects in TSV format
- fields#
field names in the output Rearrangement file.
- Type:
list
- external_fields#
list of fields in the output file that are not part of the Rearrangement definition.
- Type:
list
- __init__(handle, fields=None, base=1, debug=False)#
Initialization
- Parameters:
handle (file) – file handle of the open Rearrangements file.
fields (list) – list of non-required fields to add. May include fields undefined by the schema.
base (int) – one of 0 or 1 specifying the coordinate schema in the output file. Data provided to the write is assumed to be in python style 0-based half-open intervals. If 1, then data will be converted to 1-based closed intervals for known fields before writing. If 0, then values will be unchanged.
debug (bool) – debug state. If True prints debug information.
- Returns:
writer object.
- Return type:
- close()#
Closes the Rearrangement file
- write(row)#
Write a row to the Rearrangement file
- Parameters:
row (dict) – row to write.
- class airr.schema.Schema(definition)#
AIRR schema definitions
- definition#
name of the schema definition.
- info#
schema info.
- Type:
collections.OrderedDict
- properties#
field definitions.
- Type:
collections.OrderedDict
- required#
list of mandatory fields.
- Type:
list
- optional#
list of non-required fields.
- Type:
list
- false_values#
accepted string values for False.
- Type:
list
- true_values#
accepted values for True.
- Type:
list
- from_bool(value, validate=False)#
Converts a boolean to a string
- Parameters:
value (bool) – logical value.
validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.
- Returns:
conversion of True or False or ‘T’ or ‘F’.
- Return type:
str
- Raises:
airr.ValidationError – raised if value is invalid when validate is set True.
- pandas_types()#
Map of schema types to pandas types
- Returns:
mapping dictionary for pandas types
- Return type:
dict
- spec(field)#
Get the properties for a field
- Parameters:
name (str) – field name.
- Returns:
definition for the field.
- Return type:
collections.OrderedDict
- template()#
Create an empty template object
- Returns:
dictionary with all schema properties set as None or an empty list.
- Return type:
collections.OrderedDict
- to_bool(value, validate=False)#
Convert a string to a boolean
- Parameters:
value (str) – logical value as a string.
validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.
- Returns:
conversion of the string to True or False.
- Return type:
bool
- Raises:
airr.ValidationError – raised if value is invalid when validate is set True.
- to_float(value, validate=False)#
Converts a string to a float
- Parameters:
value (str) – float value as a string.
validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.
- Returns:
conversion of the string to a float.
- Return type:
float
- Raises:
airr.ValidationError – raised if value is invalid when validate is set True.
- to_int(value, validate=False)#
Converts a string to an integer
- Parameters:
value (str) – integer value as a string.
validate (bool) – when True raise a ValidationError for an invalid value. Otherwise, set invalid values to None.
- Returns:
conversion of the string to an integer.
- Return type:
int
- Raises:
airr.ValidationError – raised if value is invalid when validate is set True.
- type(field)#
Get the type for a field
- Parameters:
name (str) – field name.
- Returns:
the type definition for the field
- Return type:
str
- validate_header(header)#
Validate header against the schema
- Parameters:
header (list) – list of header fields.
- Returns:
True if a ValidationError exception is not raised.
- Return type:
bool
- Raises:
airr.ValidationError – raised if header fails validation.
- validate_object(obj, missing=True, nonairr=True, context=None, check_nullable=True)#
Validate Repertoire object data against schema
- Parameters:
obj (dict) – dictionary containing a single repertoire object.
missing (bool) – provides warnings for missing optional fields.
(bool (nonairr) – provides warning for non-AIRR fields that cannot be validated.
context (string) – used by recursion to indicate place in object hierarchy
check_nullable (bool) – check if data complies with the required fields as determined by the nullable flag.
- Returns:
True if a ValidationError exception is not raised.
- Return type:
bool
- Raises:
airr.ValidationError – raised if object fails validation.
- validate_row(row)#
Validate Rearrangements row data against schema
- Parameters:
row (dict) – dictionary containing a single record.
- Returns:
True if a ValidationError exception is not raised.
- Return type:
bool
- Raises:
airr.ValidationError – raised if row fails validation.
Schema#
- airr.schema.InfoSchema Schema object for the Info definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.DataFileSchema Schema object for the DataFile definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.AlignmentSchema Schema object for the Alignment definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.RearrangementSchema Schema object for the Rearrangement definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.RepertoireSchema Schema object for the Repertoire definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.GermlineSetSchema Schema object for the Repertoire definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
- airr.schema.GenotypeSetSchema Schema object for the Repertoire definition#
AIRR schema definitions
- airr.schema.definition#
name of the schema definition.
- airr.schema.info#
schema info.
- Type:
collections.OrderedDict
- airr.schema.properties#
field definitions.
- Type:
collections.OrderedDict
- airr.schema.required#
list of mandatory fields.
- Type:
list
- airr.schema.optional#
list of non-required fields.
- Type:
list
- airr.schema.false_values#
accepted string values for False.
- Type:
list
- airr.schema.true_values#
accepted values for True.
- Type:
list
Deprecated#
- airr.load_repertoire(filename, validate=False, debug=False)#
Load an AIRR repertoire metadata file
- Parameters:
filename (str) – path to the input file.
validate (bool) – whether to validate data as it is read, raising a ValidationError exception in the event of an error.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
dictionary of AIRR Data objects.
- Return type:
dict
Deprecated since version 1.4: Use
read_airr()
instead.
- airr.write_repertoire(filename, repertoires, info=None, debug=False)#
Write an AIRR repertoire metadata file
- Parameters:
file (str) – path to the output file.
repertoires (list) – array of repertoire objects.
info (object) – info object to write. Will write current AIRR Schema info if not specified.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if the file is written without error.
- Return type:
bool
Deprecated since version 1.4: Use
write_airr()
instead.
- airr.validate_repertoire(filename, debug=False)#
Validates an AIRR repertoire metadata file
- Parameters:
filename (str) – path of the file to validate.
debug (bool) – debug flag. If True print debugging information to standard error.
- Returns:
True if files passed validation, otherwise False.
- Return type:
bool
Deprecated since version 1.4: Use
validate_airr()
instead.
- airr.repertoire_template()#
Return a blank repertoire object from the template. This object has the complete structure with all of the fields and all values set to None or empty string.
- Returns:
empty repertoire object.
- Return type:
object
Deprecated since version 1.4: Use
schema.Schema.template()
instead.