The MapSimIndex Class

MapSimIndex

See MemorySimIndex for sample usage

class pysimsearch.sim_index.MapSimIndex(name_to_docid_map=None, docid_to_name_map=None, docid_to_feature_map=None, term_index=None, df_map=None, doc_len_map=None)

Inherits from pysimsearch.sim_index.SimIndex.

Simple implementation of the SimIndex interface backed with dict-like objects (MutableMapping). By default, uses dict, in which case the indexes are in-memory.

NOTE: to ensure proper compatibility with arbitrary dict-like objects, including persistent shelves, any mutations must be done using assignment. E.g., do not do:

map[key].extend([a, b])

Instead, do the equivalent of:

map[key] += [a,b]  # same as: map[key] = map[key].__iadd__([a,b])
docids_with_terms(terms)

Returns a list of docids of docs containing all terms

docnames_with_terms(*terms)

Returns an iterable of docnames containing terms

get_local_N()

Return local number of documents

index_filenames(*filenames)

Build a similarity index over files given by filenames

Convenience method that wraps index_files()

Params:
*filenames: list of filenames to add to the index.
index_files(named_files)

Build a similarity index over collection given in named_files named_files is a list iterable of (filename, file) pairs

index_string_buffers(named_string_buffers)

Adds string buffers to the index.

Params:
named_string_buffers: iterable of (name, string) tuples, where
the string contains the data to index.
postings_list(term)

Returns list of (docid, freq) tuples for documents containing term

query(query_vec)

Finds documents similar to query_vec

Params:
query_vec: term vector representing query document
Returns:
A iterable of (docname, score) tuples sorted by score
query_by_string(query_string)

Finds documents similar to query_string.

Convenience method that calls self.query()

Params:
query_string: the query given as a string
set_global_N(N)

Set global number of documents

set_query_scorer(query_scorer)

Set the query_scorer

Params:
query_scorer: if string type, we assume it is a scorer name,
else we assume it is itself a scoring object of base type query_scorer.QueryScorer.

Previous topic

The SimIndex Class

Next topic

The MemorySimIndex Class

This Page