prot_db
The prot_db module contains classes to handle protein file and protein description which can be either generate by Prodigal or Provide by Gembase. It also provide an interface to abstract the way to get protein sequences and descriptions
- class integron_finder.prot_db.GembaseDB(replicon, cfg, gembase_path=None, prot_file=None)[source]
Implements
ProteinDB
from a Gembase. Managed proteins from Proteins directory corresponding to a replicon/contig- __getitem__(prot_seq_id)[source]
- Parameters
prot_seq_id (str) – the id of a protein sequence
- Returns
The Sequence corresponding to the prot_seq_id.
- Return type
Bio.SeqRecord
object
- __init__(replicon, cfg, gembase_path=None, prot_file=None)[source]
- Parameters
replicon (
Bio.SeqRecord
object with a extra attribute path) – The replicon used to create ProteinDB (protein files and extra information)cfg (
integron_finder.config.Config
object) – The integron_finder configurationprot_file – The path to a protein file in fasta format which is the translation of the replicon
Warning
The replicon is a modified Bio.SeqRecord object. The attribute path must be injected in the object This attribute represent the path to a fasta file representing this replicon
- __iter__()[source]
- Returns
a generator which iterate on the protein seq_id which constitute the contig.
- Return type
generator
- _find_gembase_file_basename(gembase_path, input_seq_path)[source]
from the input file name, try to retrieve the basename which is used in gembase This specially useful when IF is run in parallel. The input sequence is split in chunks and treat in parallel. But in this case the name of the chunk does not math neither the lstinfo file nor the protein file. So this method try retrieve the original basename without extension for instance:
ACBA.0917.00019.fna => ACBA.0917.00019 ACBA.0917.00019.0001.fna => ACBA.0917.00019 ESCO001.C.00001.C001.fst => ESCO001.C.00001.C001 ESCO001.C.00001.C001_chunk_1.fst => ESCO001.C.00001.C001
- Returns
the gemabse basename corresponding to the input file
- Return type
string
- _make_protfile(path=None)[source]
Create fasta file with protein corresponding to this sequence, from the corresponding Gembase protfile This step is necessary because in Gembase Draft One nucleic file can contains several contigs, but all proteins are in the same file.
- Returns
the path of the created protein file
- Return type
str
- _parse_lst()[source]
Parse the LSTINFO file and extract information specific to the replicon :return: class:pandas.DataFrame` object
- static gembase_complete_parser(lst_path, sequence_id)[source]
- Parameters
lst_path (str) – the path of of the LSTINFO file Gembase Complet
sequence_id (str) – the id of the genomic sequence to analyse
- Returns
the information related to the ‘valid’ CDS corresponding to the sequence_id
- Return type
class:pandas.DataFrame` object
- static gembase_draft_parser(lst_path, replicon_id)[source]
- Parameters
lst_path (str) – the path of of the LSTINFO file from a Gembase Draft
sequence_id (str) – the id of the genomic sequence to analyse
- Returns
the information related to the ‘valid’ CDS corresponding to the sequence_id
- Return type
class:pandas.DataFrame` object
- static gembase_sniffer(lst_path)[source]
Detect the type of gembase :param str lst_path: the path to the LSTINFO file corresponding to the nucleic sequence :returns: either ‘Complet’ or ‘Draft’
- get_description(gene_id)[source]
- Parameters
gene_id (str) – a protein/gene identifier
- Returns
The description of the protein corresponding to the gene_id
- Return type
SeqDesc
namedtuple object- Raises
IntegronError – when gene_id is not a valid Gembase gene identifier
KeyError – if gene_id is not found in GembaseDB instance
- class integron_finder.prot_db.ProdigalDB(replicon, cfg, prot_file=None)[source]
Creates proteins from Replicon/contig using prodigal and provide facilities to access them.
- __getitem__(prot_seq_id)[source]
- Parameters
prot_seq_id (str) – the id of a protein sequence
- Returns
The Sequence corresponding to the prot_seq_id.
- Return type
Bio.SeqRecord
object
- __iter__()[source]
- Returns
a generator which iterate on the protein seq_id which constitute the contig.
- Return type
generator
- _make_protfile(path=None)[source]
Use prodigal to generate proteins corresponding to the replicon
- Returns
the path of the created protfile
- Return type
str
- get_description(gene_id)[source]
- Parameters
gene_id (str) – a protein/gene identifier
- Returns
The description of the protein corresponding to the gene_id
- Return type
SeqDesc
namedtuple object- Raises
IntegronError – when gene_id is not a valid Gembase gene identifier
KeyError – if gene_id is not found in ProdigalDB instance
- class integron_finder.prot_db.ProteinDB(replicon, cfg, prot_file=None)[source]
AbstractClass defining the interface for ProteinDB. ProteinDB provide an abstraction and a way to access to proteins corresponding to the replicon/contig CDS.
- abstract __getitem__(prot_seq_id)[source]
- Parameters
prot_seq_id (str) – the id of a protein sequence
- Returns
The Sequence corresponding to the prot_seq_id.
- Return type
Bio.SeqRecord
object- Raises
KeyError – when seq_id does not match any sequence in DB
- abstract __iter__()[source]
- Returns
a generator which iterate on the protein seq_id which constitute the contig.
- Return type
generator
- __weakref__
list of weak references to the object (if defined)
- _make_db()[source]
- Returns
an index of the sequence contains in protfile corresponding to the replicon
- abstract _make_protfile(path=None)[source]
Create fasta file with protein corresponding to the nucleic sequence (replicon)
- Returns
the path of the created protein file
- Return type
str
- abstract get_description(gene_id)[source]
- Parameters
gene_id (str) – a protein/gene identifier
- Returns
The description of the protein corresponding to the gene_id
- Return type
SeqDesc
namedtuple object- Raises
IntegronError – when gene_id is not a valid Gembase gene identifier
KeyError – if gene_id is not found in GembaseDB instance
- property protfile
- Returns
The absolute path to the protein file corresponding to contig id
- Return type
str
- class integron_finder.prot_db.SeqDesc(id, strand, start, stop)
- __getnewargs__()
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, id, strand, start, stop)
Create new instance of SeqDesc(id, strand, start, stop)
- __repr__()
Return a nicely formatted representation string
- _asdict()
Return a new dict which maps field names to their values.
- classmethod _make(iterable)
Make a new SeqDesc object from a sequence or iterable
- _replace(**kwds)
Return a new SeqDesc object replacing specified fields with new values
- id
Alias for field number 0
- start
Alias for field number 2
- stop
Alias for field number 3
- strand
Alias for field number 1