The query_scorer Module

Scoring algorithms for finding similar documents

class pysimsearch.query_scorer.QueryScorer

Interface for query scorers which score similarity search results

QueryScorers are used by the SimIndex.query() method to handle the scoring of similarity search results.

static make_scorer(scorer_type)

Returns a new scorer object

static register_scorers(scorer_map)
score_docs(query_vec, postings_lists, **extra)

Scores documents’ similarities to query

Scans postings_lists to compute similarity scores for docs for the query term vector

Params:
query: the query document postings_lists: a list of postings lists for terms in query
Returns:
A sorted iterable of (docid, score) tuples
class pysimsearch.query_scorer.SimpleCountQueryScorer

QueryScorer that uses simple term frequencies for scoring.

score_docs(query_vec, postings_lists, **extra)

Scores query-document similarity using number of occurrences of query terms in document. Multiple occurrences of a term in the query are ignored.

class pysimsearch.query_scorer.TFIDFQueryScorer(tf_weight_type=u'raw')

QueryScorer that uses TFIDF weighting with the cosine similarity measure.

This implementation is actually an approximation to the true cosine, because of the way we normalize by document length. When computing document length, we assume a term weight of 1 for each document term. E.g., we do not factor in term weights when computing the “document length”, since that would require choosing the weighting strategy at index time.

Query length is ignored, as it has no effect on relative ordering

static idf_weight_log(N, df)

Returns idf weight

score_docs(query_vec, postings_lists, N, get_doc_freq, get_doc_len, **extra)

Scores documents’ similarities to query using cosine similarity in a vector space model. Uses tf.idf weighting.

An individual term hit is scored as:

idf * self.tf_weight(q_tf) * self.tf_weight(d_tf)

The overall score for a doc is given by the sum of the term-hit scores

static tf_weight_log(tf)

Returns sublinear scaling of tf: 1+log(tf)

static tf_weight_raw(tf)

Returns unscaled tf

Previous topic

The sim_server Module

Next topic

The term_vec Module

This Page