ShelfSimIndex
Sample usage:
from pprint import pprint
from pysimsearch.sim_index import ShelfSimIndex
from pysimsearch import doc_reader
sim_index = ShelfSimIndex()
sim_index.index_filenames('http://www.stanford.edu/',
'http://www.berkeley.edu',
'http://www.ucla.edu',
'http://www.mit.edu')
pprint(sim_index.postings_list('university'))
pprint(list(sim_index.docnames_with_terms('university', 'california')))
sim_index.set_query_scorer('simple_count')
pprint(list(sim_index.query_by_string("stanford university")))
Inherits from pysimsearch.sim_index.MapSimIndex.
Shelf-based implementation of SimIndex. Indexes are backed with persistent shelve.DbfilenameShelf objects.
Returns a list of docids of docs containing all terms
Returns an iterable of docnames containing terms
Return local number of documents
Build a similarity index over files given by filenames
Convenience method that wraps index_files()
Build a similarity index over collection given in named_files named_files is a list iterable of (filename, file) pairs
Adds string buffers to the index.
Returns list of (docid, freq) tuples for documents containing term
Finds documents similar to query_vec
Finds documents similar to query_string.
Convenience method that calls self.query()
Set global number of documents
Set the query_scorer