Aligner
This module implements the aligner.
- class Data(score: float)
Private data class for the Needleman-Wunsch+Gotoh sequence aligner.
- __init__(score: float)
- score: float
The current score.
- p: float
\(P_{m,n}\) in [Gotoh1982].
- q: float
\(Q_{m,n}\) in [Gotoh1982].
- pSize: int
The size of the p gap. \(k\) in [Gotoh1982].
- qSize: int
The size of the q gap. \(k\) in [Gotoh1982].
- class Aligner(start_score: float = - 1.0, open_score: float = - 1.0, extend_score: float = - 0.5)
A generic Needleman-Wunsch+Gotoh sequence aligner.
This implementation uses Gotoh’s improvements to get \(\mathcal{O}(mn)\) running time and reduce memory requirements to essentially the backtracking matrix only. In Gotoh’s technique the gap weight formula must be of the special form \(w_k = uk + v\) (affine gap). \(k\) is the gap size, \(v\) is the gap opening score and \(u\) the gap extension score.
The aligner is type-agnostic. When the aligner wants to compare two objects, it calls the method
similarity()
with both objects as arguments. This method should return the score of the alignment. The score should increase with the desirability of the alignment, but otherwise there are no fixed rules.The score must harmonize with the penalties for inserting gaps. If the score for opening a gap is -1.0 (the default) then a satisfactory match should return a score > 1.0.
The
similarity()
function may consult a PAM or BLOSUM matrix, or compute a hamming distance between the arguments. It may also use auxiliary data like Part-of-Speech tags. In this case the data type aligned could be a dict containing the word and the POS-tag.- __init__(start_score: float = - 1.0, open_score: float = - 1.0, extend_score: float = - 0.5)
- start_score: float
The gap opening score at the start of the string. Set this to 0 to find local alignments.
- open_score: float
The gap opening score \(v\).
- extend_score: float
The gap extension score \(u\).
- align(seq_a: Sequence[object], seq_b: Sequence[object], similarity: Callable[[object, object], float], gap_a: Optional[Callable[[], object]] = None, gap_b: Optional[Callable[[], object]] = None) Tuple[Sequence[object], Sequence[object], float]
Align two sequences.
- Parameters
similarity – a callable that returns the similarity of two objects
gap_a – insert gap_a() for a gap in sequence a. None inserts None.
gap_b – insert gap_b() for a gap in sequence b. None inserts gap_a().
- Returns
the aligned sequences and the score
- build_debug_matrix(matrix: List[List[super_collator.aligner.Data]], len_matrix: List[List[int]], ts_a: Sequence[object], ts_b: Sequence[object]) str
Build a human-readable debug matrix.
- Parameters
matrix – the full scoring matrix
len_matrix – the backtracking matrix
ts_a – the first aligned string
ts_b – the second aligned string
- Return str
the debug matrix as human readable string