The SimIndex Class
SimIndex
See pysimsearch.sim_index.MemorySimIndex for sample usage
-
class pysimsearch.sim_index.SimIndex
Base class for similarity indexes
Defines interface as well as provides default implementation for
several methods.
- Instance Attributes:
- config: dictionary of configuration variables
-
docid_to_name(docid)
Returns document name for a given docid
-
docids_with_terms(terms)
Returns a list of docids of docs containing all terms
-
docnames_with_terms(*terms)
Returns an iterable of docnames containing terms
-
get_local_N()
Return local number of documents
-
get_local_df_map()
Get local df stats
-
get_name_to_docid_map()
Return local mapping of name to docids
-
index_filenames(*filenames)
Build a similarity index over files given by filenames
Convenience method that wraps index_files()
- Params:
- *filenames: list of filenames to add to the index.
-
index_files(named_files)
Adds files given in named_files to the index.
- Params:
- named_files: iterable of (filename, file) pairs.
- Takes ownership of (and consumes) the files.
-
index_string_buffers(named_string_buffers)
Adds string buffers to the index.
- Params:
- named_string_buffers: iterable of (name, string) tuples, where
- the string contains the data to index.
-
name_to_docid(name)
Returns docid for a given document name
-
postings_list(term)
Return list of (docid, frequency) tuples for docs that contain term
-
query(query_vec)
Finds documents similar to query_vec
- Params:
- query_vec: term vector representing query document
- Returns:
- A iterable of (docname, score) tuples sorted by score
-
query_by_string(query_string)
Finds documents similar to query_string.
Convenience method that calls self.query()
- Params:
- query_string: the query given as a string
-
set_global_N(N)
Set global number of documents
-
set_global_df_map(df_map)
Set global df stats
-
set_query_scorer(query_scorer)
Set the query_scorer
- Params:
- query_scorer: if string type, we assume it is a scorer name,
- else we assume it is itself a scoring object
of base type query_scorer.QueryScorer.