The SimIndex Class

SimIndex

See pysimsearch.sim_index.MemorySimIndex for sample usage

class pysimsearch.sim_index.SimIndex

Base class for similarity indexes

Defines interface as well as provides default implementation for several methods.

Instance Attributes:
config: dictionary of configuration variables
docid_to_name(docid)

Returns document name for a given docid

docids_with_terms(terms)

Returns a list of docids of docs containing all terms

docnames_with_terms(*terms)

Returns an iterable of docnames containing terms

get_local_N()

Return local number of documents

get_local_df_map()

Get local df stats

get_name_to_docid_map()

Return local mapping of name to docids

index_filenames(*filenames)

Build a similarity index over files given by filenames

Convenience method that wraps index_files()

Params:
*filenames: list of filenames to add to the index.
index_files(named_files)

Adds files given in named_files to the index.

Params:
named_files: iterable of (filename, file) pairs.
Takes ownership of (and consumes) the files.
index_string_buffers(named_string_buffers)

Adds string buffers to the index.

Params:
named_string_buffers: iterable of (name, string) tuples, where
the string contains the data to index.
name_to_docid(name)

Returns docid for a given document name

postings_list(term)

Return list of (docid, frequency) tuples for docs that contain term

query(query_vec)

Finds documents similar to query_vec

Params:
query_vec: term vector representing query document
Returns:
A iterable of (docname, score) tuples sorted by score
query_by_string(query_string)

Finds documents similar to query_string.

Convenience method that calls self.query()

Params:
query_string: the query given as a string
set_global_N(N)

Set global number of documents

set_global_df_map(df_map)

Set global df stats

set_query_scorer(query_scorer)

Set the query_scorer

Params:
query_scorer: if string type, we assume it is a scorer name,
else we assume it is itself a scoring object of base type query_scorer.QueryScorer.

Previous topic

The sim_index Module

Next topic

The MapSimIndex Class

This Page