attc
- integron_finder.attc.find_attc_max(integrons, replicon, distance_threshold, model_attc_path, max_attc_size, min_attc_size, evalue_attc=1.0, circular=True, out_dir='.', cmsearch_bin='cmsearch', cpu=1)[source]
Look for attC site with cmsearch –max option which remove all heuristic filters. As this option make the algorithm way slower, we only run it in the region around a hit. We call it local_max or eagle_eyes.
Default hit
attC __________________-->____-->_________-->_____________ ______<--------______________________________________ intI ^-------------------------------------^ Search-space with --local_max
Updated hit
attC *** *** __________________-->____-->___-->___-->___-->_______ ______<--------______________________________________ intI
- Parameters
integrons (list of
Integron
objects.) – the integrons may contain or not attC or intI.replicon (
Bio.Seq.SeqRecord
object.) – replicon where the integrons were found (genomic fasta file).distance_threshold (int) – the maximal distance between 2 elements to aggregate them.
evalue_attc (float) – evalue threshold to filter out hits above it.
model_attc_path (str) – path to the attc model (Covariance Matrix).
max_attc_size (int) – maximum value for the attC size.
min_attc_size (int) – minimum value for the attC size.
circular (bool) – True if replicon is circular, False otherwise.
out_dir (str) – The directory where to write results used indirectly by some called functions as
infernal.local_max()
or infernal.expand.cmsearch_bin (str) – The path to the cmsearch_bin binary to use
cpu (int) – call local_max with the right number of cpu
- Returns
- Return type
pd.DataFrame
object
- integron_finder.attc.search_attc(attc_df, keep_palindromes, dist_threshold, replicon_size, rep_topology)[source]
Parse the attc data set (sorted along start site) for the given replicon and return list of arrays. One array is composed of attC sites on the same strand and separated by a distance less than dist_threshold.
- Parameters
attc_df (
pandas.DataFrame
) –keep_palindromes (bool) – True if the palindromes must be kept in attc result, False otherwise
dist_threshold (int) – the maximal distance between 2 elements to aggregate them
replicon_size (int) – the replicon number of base pair
rep_topology (str) – the replicon topology should be ‘lin’ or ‘circ’
- Returns
a list attC sites found on replicon
- Return type
list of
pandas.DataFrame
objects