infernal¶
-
integron_finder.infernal.
expand
(replicon, window_beg, window_end, max_elt, circular, dist_threshold, model_attc_path, max_attc_size=200, min_attc_size=40, evalue_attc=1.0, search_left=False, search_right=False, out_dir='.', cpu=1, cmsearch_bin='cmsearch')[source]¶ for a given element, we can search on the left hand side (if integrase is on the right for instance) or right hand side (opposite situation) or both side (only integrase or only attC sites)
- Parameters
replicon (a
Bio.Seq.SeqRecord
object.) – The Replicon to annotatewindow_beg (int) – start of window to search for attc (position of protein)
window_end (int) – end of window to search for attc (position of protein)
max_elt (
pandas.DataFrame
object) –DataFrame with columns:
Accession_number cm_attC cm_debut cm_fin pos_beg pos_end sens evalue
and each row is an occurrence of attc site
df_max (
pandas.DataFrame
object) –DataFrame with columns
Accession_number cm_attC cm_debut cm_fin pos_beg pos_end sens evalue
and each row is an occurrence of attc site
circular (bool) – True if replicon topology is circular otherwise False.
dist_threshold (int) – Two elements are aggregated if they are distant of dist_threshold [4kb] or less
max_attc_size (int) – The maximum value for the attC size
min_attc_size (int) – The minimum value for the attC size
model_attc_path (str) – the path to the attc model file
evalue_attc (float) – evalue threshold to filter out hits above it
search_left (bool) – trigger the local_max search on the left of the already detected element
search_right (bool) – trigger the local_max search on the right of the already detected element
out_dir (str) – The path to directory where to write results
cpu (int) – the number of cpu use by expand
cmsearch_bin (str) – the path to the cmsearch binary to use
- Returns
a copy of max_elt with attC hits
- Return type
pandas.DataFrame
object
-
integron_finder.infernal.
find_attc
(replicon_path, replicon_id, cmsearch_path, out_dir, model_attc, incE=1.0, cpu=1)[source]¶ Call cmsearch to find attC sites in a single replicon.
- Parameters
replicon_path (str) – the path of the fasta file representing the replicon to analyse.
replicon_id (str) – the id of the replicon to analyse.
cmsearch_path (str) – the path to the cmsearch executable.
out_dir (str) – the path to the directory where cmsearch outputs will be stored.
model_attc (str) – path to the attc model (Covariance Matrix).
incE (float) – consider sequences <= this E-value threshold as significant (to get the alignment with -A)
cpu (int) – the number of cpu used by cmsearch.
- Returns
None, the results are written on the disk.
- Raises
RuntimeError – when cmsearch run failed.
-
integron_finder.infernal.
local_max
(replicon, window_beg, window_end, model_attc_path, strand_search='both', evalue_attc=1.0, max_attc_size=200, min_attc_size=40, cmsearch_bin='cmsearch', out_dir='.', cpu=1)[source]¶ - Parameters
replicon (
Bio.Seq.SeqRecord
object.) – The replicon to analysewindow_beg (int) – Start of window to search for attc (position of protein).
window_end (int) – End of window to search for attc (position of protein).
model_attc_path (str) – The path to the covariance model for attc (eg: attc_4.cm) used by cmsearch to find attC sites
strand_search (str) –
The strand on which to looking for attc. Available values:
’top’: Only search the top (Watson) strand of target sequences.
’bottom’: Only search the bottom (Crick) strand of target sequences
’both’: search on both strands
evalue_attc (float) – evalue threshold to filter out hits above it
max_attc_size (int) – The maximum value fot the attC size
min_attc_size (int) – The minimum value fot the attC size
cmsearch_bin (str) – The path to cmsearch
out_dir (str) – The path to directory where to write results
cpu (int) – The number of cpu used by cmsearch
- Returns
DataFrame with same structure as the DataFrame returns by
read_infernal()
where position are converted on position on replicon and attc are filtered by evalue, min_attc_size, max_attc_size also write a file with intermediate results <replicon_id>_subseq_attc_table_end.res this file store the local_max results before filtering by max_attc_size and min_attc_size- Return type
pandas.DataFrame
object
-
integron_finder.infernal.
read_infernal
(infile, replicon_id, len_model_attc, evalue=1, size_max_attc=200, size_min_attc=40)[source]¶ Function that parse cmsearch –tblout output and returns a pandas DataFrame
- Parameters
infile (str) – the path to the output of cmsearch in tabulated format (–tblout)
replicon_id (str) – the id of the replicon are the integrons were found.
len_model_attc (int) – the length of the attc model
evalue (float) – evalue threshold to filter out hits above it
size_max_attc (int) – The maximum value fot the attC size
size_min_attc (int) – The minimum value fot the attC size
- Returns
table with columns:
”Accession_number”, “cm_attC”, “cm_debut”, “cm_fin”, “pos_beg”, “pos_end”, “sens”, “evalue”and each row is a hit that match the attc covariance model.- Return type
pandas.DataFrame
object