The ShelfSimIndex Class

ShelfSimIndex

Sample usage:

from pprint import pprint
from pysimsearch.sim_index import ShelfSimIndex
from pysimsearch import doc_reader

sim_index = ShelfSimIndex()
sim_index.index_filenames('http://www.stanford.edu/',
                          'http://www.berkeley.edu',
                          'http://www.ucla.edu',
                          'http://www.mit.edu')
pprint(sim_index.postings_list('university'))
pprint(list(sim_index.docnames_with_terms('university', 'california')))

sim_index.set_query_scorer('simple_count')
pprint(list(sim_index.query_by_string("stanford university")))
class pysimsearch.sim_index.ShelfSimIndex(filename, flag)

Inherits from pysimsearch.sim_index.MapSimIndex.

Shelf-based implementation of SimIndex. Indexes are backed with persistent shelve.DbfilenameShelf objects.

docids_with_terms(terms)

Returns a list of docids of docs containing all terms

docnames_with_terms(*terms)

Returns an iterable of docnames containing terms

get_local_N()

Return local number of documents

index_filenames(*filenames)

Build a similarity index over files given by filenames

Convenience method that wraps index_files()

Params:
*filenames: list of filenames to add to the index.
index_files(named_files)

Build a similarity index over collection given in named_files named_files is a list iterable of (filename, file) pairs

index_string_buffers(named_string_buffers)

Adds string buffers to the index.

Params:
named_string_buffers: iterable of (name, string) tuples, where
the string contains the data to index.
postings_list(term)

Returns list of (docid, freq) tuples for documents containing term

query(query_vec)

Finds documents similar to query_vec

Params:
query_vec: term vector representing query document
Returns:
A iterable of (docname, score) tuples sorted by score
query_by_string(query_string)

Finds documents similar to query_string.

Convenience method that calls self.query()

Params:
query_string: the query given as a string
set_global_N(N)

Set global number of documents

set_query_scorer(query_scorer)

Set the query_scorer

Params:
query_scorer: if string type, we assume it is a scorer name,
else we assume it is itself a scoring object of base type query_scorer.QueryScorer.

Previous topic

The MemorySimIndex Class

Next topic

The RemoteSimIndex Class

This Page