BanzaiDB package

Submodules

BanzaiDB.banzaidb module

BanzaiDB v0.1.2 - Database tool for the Banzai NGS pipeline (http://github.com/mscook/BanzaiDB)

BanzaiDB.banzaidb.create_parser()[source]

Create the CLI parser

Returns:a parser with subparsers: init, populate, update & query ——–
BanzaiDB.banzaidb.db_query(args)[source]

List available (via fab) or provide a ReQL query function

BanzaiDB.banzaidb.init_database_with_default_tables(args)[source]

Create a new RethinkDB database

Parameters:args – an argparse argument (force)
BanzaiDB.banzaidb.main()[source]

Main function - essentially calls the CLI parser & directs execution

BanzaiDB.banzaidb.populate_annotation()[source]

Populate the database with an ordering run

BanzaiDB.banzaidb.populate_assembly()[source]

Populate the database with an assembly run

BanzaiDB.banzaidb.populate_database_with_data(args)[source]

Populate the RethinkDB

This is essentially a placeholder that directs the input data to its specific populate method.

Parameters:args – an argparse argument (run_type)
BanzaiDB.banzaidb.populate_mapping(args)[source]

Populate database with a mapping run. Only support for Nesoni at the moment

TODO: This should also handle BWA. Will need to differentiate between Nesoni & BWA runs and handle VCF files.

For speed faster DB inserts-
  • batch size should be about 200 docs,
  • increase concurrency
  • durability=”soft” (will miss data if inserting where power off)
  • run(durability=”soft”, noreply=True) == danger!
Parameters:args – an argparse argument (run_path) which is the full path as a string to the Banzai run (inclusive of $PROJECTBASE). For example: /$PROJECTBASE/map/$REF.2014-04-28-mon-16-41-51
BanzaiDB.banzaidb.populate_ordering()[source]

Populate the database with an ordering run

BanzaiDB.banzaidb.populate_qc()[source]

Populate the database with a QC run

BanzaiDB.banzaidb.updateDB(args)[source]

Update a DB -> should this be possible?

Perhaps update should mean add a “run” and make it the active one?

BanzaiDB.config module

class BanzaiDB.config.BanzaiDBConfig[source]

Bases: object

BanzaiDB configuration class

dump_items()[source]

Returns a string of all configuration options

Returns:a new line delimited string of all config options
read_config()[source]

Read a BanzaiDB configuration file

Currently only supports:
  • db_host = [def = localhost]
  • port = [def = 28015]
  • db_name = [def = Banzai]
  • auth_key = [def = ‘’]

Note

updated so that “port” is stored as an integer

BanzaiDB.converters module

BanzaiDB.converters.convert_from_JSON_to_CSV(json_data, header=False)[source]

Converts a single JSON element to CSV

Note

this will not handle nested JSON. Will need to used something like https://github.com/evidens/json2csv to achieve this

Parameters:
  • json_data – the JSON
  • header – [optional] include the and return the header
BanzaiDB.converters.convert_from_csv_to_JSON(csv_data, header=False)[source]

Converts from CSV to JSON

NotImplemented yet!

Parameters:
  • json_data – csv data
  • header – [optional]

BanzaiDB.core module

BanzaiDB.core.nesoni_report_to_JSON(report_file)[source]

Convert a nesoni report.txt to JSON

All features in report are parsed

See: tables.rst

Parameters:report_file – fullpath as a string to the report file
Returns:a list of JSON
BanzaiDB.core.reference_genome_features_to_JSON(genome_file)[source]

From genome reference (GBK format) convert CDS, gene & RNA features to JSON

The following 2 are really good resources:

Note

also see tables.rst for detailed description of the JSON schema

Warning

do not think that this handles misc_features

Parameters:genome_file – the fullpath as a string to the genbank file
Returns:a JSON representing the the reference and a list of JSON containing information on the features

BanzaiDB.database module

BanzaiDB.database.make_connection()[source]

Make a connection to the RethinkDB database

Pulls settings (host, port, database name & auth_key from BanzaiDBConfig())

..note:: The RethinkDB connection is a context manager. Thus use this
funtion like ‘with make_connection():’
Returns:a connection context manager

BanzaiDB.errors module

exception BanzaiDB.errors.CouldNotParseJSONError(code)[source]

Bases: exceptions.Exception

The conversion only takes a single JSON element, not a list of elements

exception BanzaiDB.errors.InvalidDBName(code)[source]

Bases: exceptions.Exception

RethinkDB only likes database names that match “^[a-zA-Z0-9_]+$”

exception BanzaiDB.errors.NestedJSONError(code)[source]

Bases: exceptions.Exception

The conversion of JSON to CSV does not support nested JSON

BanzaiDB.fetch module

BanzaiDB.fetch.get_genbank(nucleotide_db_id)[source]

Given a complete genome identifier (NCBI) return the genome (SeqIO) object

BanzaiDB.imaging module

BanzaiDB.imaging.gen_colors(number)[source]

Generate a list of length number of distinct “good” random colors

Based on http://martin.ankerl.com/
2009/12/09/how-to-create-random-colors-programmatically/
Parameters:number – int
Type:int
Return type:a list of lists in the form: [[243, 137, 121], [232, 121, 243], [216, 121, 243]]
BanzaiDB.imaging.hsv_to_rgb(h, s, v)[source]

Convert HSV to RGB

Parameters:
  • h – hue
  • s – saturation
  • v – value
BanzaiDB.imaging.plot_SNPs(snp_features, labels)[source]

Using GenomeDiagram from Biopython generate a SNP position plot_SNPs

BanzaiDB.misc module

BanzaiDB.misc.create_feature(begin, end, feat_type, strand=None)[source]

Creates a BioPython SeqFeature record

Parameters:
  • begin (int) – where the variant starts
  • end (int) – where the variaants ends
  • type – if a substitution or INDEL
  • strand – [default] None of -1/1
Returns:

a Bio.SeqFeature object

BanzaiDB.parsers module

Functions to parse a nesoni report .txt file

BanzaiDB.parsers.parse_deletion(consequence)[source]
BanzaiDB.parsers.parse_deletion_misc(consequence)[source]
BanzaiDB.parsers.parse_evidence(evidence)[source]

From an evidence string/element return a dictionary or obs/counts

Parameters:evidence – an evidence string. It looks something like this - Ax27 AGCAx1 AGCAATTAATTAAAATAAx
BanzaiDB.parsers.parse_insertion(consequence)[source]
BanzaiDB.parsers.parse_insertion_misc(consequence)[source]
BanzaiDB.parsers.parse_substitution(consequence)[source]

Return fields for syn, non-syn or correlated

BanzaiDB.parsers.parse_substitution_misc(consequence)[source]

Return fields for syn, non-syn or correlated

BanzaiDB.parsers.strip_non_CDS(protein_line)[source]

Remove STS/misc_feature etc. from the protein line

Parameters:protein_line – a parsed protein line as a string

Module contents