The doc_reader Module

Utilities for creating term vectors from data

pysimsearch.doc_reader.get_named_text_files(*names)

Returns an iterator of (filename, file) tuples from filenames and/or urls (convenience function)

pysimsearch.doc_reader.get_text_file(name)

Returns a text stream from filename or url

pysimsearch.doc_reader.get_text_files(*names)

Returns iterator of files from filenames and/or urls

pysimsearch.doc_reader.term_vec(file, stoplist=None)

Returns a term vector for ‘file’, represented as a dictionary of the form {term: frequency}

pysimsearch.doc_reader.term_vec_from_string(s)

Returns term vector for string s, represented as a dictionary of the from {term: frequency}

(Convenience function - wraps term_vec())

Previous topic

The similarity Module

Next topic

The freq_tools Module

This Page