pycrossword
0.4
Pure-Python implementation of a crossword puzzle generator and editor
|
A single import task to import words from a DIC file (downloaded from the Hunspell repo) to an SQLite database *.db file. More...
Public Member Functions | |
def | __init__ (self, lang, dicfile=None, posrules=None, posrules_strict=False, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_stopcheck=None, id=0) |
def | run (self) |
Overridden worker method called when the task is started: does the import job. More... | |
Public Attributes | |
signals | |
HunspellImportSignals signals emiited by the import task More... | |
lang | |
str short name of the language, e.g. More... | |
dicfile | |
str | None full path to the DIC file to import words from More... | |
posrules | |
dict part-of-speech regular expression parsing rules More... | |
posrules_strict | |
bool import only the indicated or all parts of speech More... | |
posdelim | |
str delimiter delimiting the word and its part of speech (default = '/') More... | |
lcase | |
bool import words in lower case More... | |
replacements | |
dict character replacement rules More... | |
remove_hyphens | |
bool remove all hyphens from words More... | |
filter_out | |
dict regex-based rules to exclude words More... | |
rows | |
2-tuple | None the start and end rows (indices) of the words to import More... | |
commit_each | |
int threshold of DB insert operations after which the changes are written to the DB More... | |
on_stopcheck | |
callback callback function called periodically to check for interrupt condition More... | |
id | |
int unique ID of this task (in the thread pool) More... | |
Private Member Functions | |
def | _delete_db (self, db) |
Deletes the existing DB file. More... | |
def | _get_pos (self, cur) |
Retrieves the list of parts of speech present in the DB. More... | |
A single import task to import words from a DIC file (downloaded from the Hunspell repo) to an SQLite database *.db file.
Derived from QtCore.QRunnable so the task can be run in a thread pool concurrently with other tasks.
def pycross.dbapi.HunspellImportTask.__init__ | ( | self, | |
lang, | |||
dicfile = None , |
|||
posrules = None , |
|||
posrules_strict = False , |
|||
posdelim = '/' , |
|||
lcase = True , |
|||
replacements = None , |
|||
remove_hyphens = True , |
|||
filter_out = None , |
|||
rows = None , |
|||
commit_each = 1000 , |
|||
on_stopcheck = None , |
|||
id = 0 |
|||
) |
lang | str short name of the language, e.g. 'en' |
dicfile | str | None full path to the DIC file to import words from (None means the default path will be assumed: pycross/assets/dic/<LANGUAGE>.dic) |
posrules | dict part-of-speech regular expression parsing rules in the format: {'N': 'regex for nouns', 'V': 'regex for verb', ...}
Possible keys are: 'N' [noun], 'V' [verb], 'ADV' [adverb], 'ADJ' [adjective], 'P' [participle], 'PRON' [pronoun], 'I' [interjection], 'C' [conjuction], 'PREP' [preposition], 'PROP' [proposition], 'MISC' [miscellaneous / other], 'NONE' [no POS] |
posrules_strict | bool if True (default), only the parts of speech present in posrules dict will be imported [all other words will be skipped]. If False , such words will be imported with 'MISC' and 'NONE' POS markers. |
posdelim | str delimiter delimiting the word and its part of speech [default = '/'] |
lcase | bool if True (default), found words will be imported in lower case; otherwise, the original case will remain |
replacements | dict character replacement rules in the format: {'char_from': 'char_to', ...}
None (no replacements) |
remove_hyphens | bool if True (default), all hyphens ['-'] will be removed from the words |
filter_out | dict regex-based rules to filter out [exclude] words in the format: {'word': ['regex1', 'regex2', ...], 'pos': ['regex1', 'regex2', ...]}
None (no filter rules apply). |
rows | 2-tuple | None the start and end rows (indices) of the words to import; e.g. (20, 100) means start import from row 20 and end import after row 100. If the second element in the tuple is negative (e.g. -1), only the start row will be considered and the import will go on till the last word in the source DIC file. None means ALL available words. |
commit_each | int threshold of insert operations after which the transaction will be committed (default = 1000) |
on_stopcheck | callback callback function called periodically to check for interrupt condition; takes 3 parameters:
|
id | int unique ID of this task (in the thread pool) |
|
private |
Deletes the existing DB file.
db | Sqlitedb a single SQLite database to delete |
|
private |
Retrieves the list of parts of speech present in the DB.
cur | SQLite cursor object the DB cursor |
list
parts of speech in the short form, e.g. ['N', 'V'] def pycross.dbapi.HunspellImportTask.run | ( | self | ) |
Overridden worker method called when the task is started: does the import job.
pycross.dbapi.HunspellImportTask.commit_each |
int
threshold of DB insert operations after which the changes are written to the DB
pycross.dbapi.HunspellImportTask.dicfile |
str
| None
full path to the DIC file to import words from
pycross.dbapi.HunspellImportTask.filter_out |
dict
regex-based rules to exclude words
pycross.dbapi.HunspellImportTask.id |
int
unique ID of this task (in the thread pool)
pycross.dbapi.HunspellImportTask.lang |
str
short name of the language, e.g.
'en'
pycross.dbapi.HunspellImportTask.lcase |
bool
import words in lower case
pycross.dbapi.HunspellImportTask.on_stopcheck |
callback
callback function called periodically to check for interrupt condition
pycross.dbapi.HunspellImportTask.posdelim |
str
delimiter delimiting the word and its part of speech (default = '/')
pycross.dbapi.HunspellImportTask.posrules |
dict
part-of-speech regular expression parsing rules
pycross.dbapi.HunspellImportTask.posrules_strict |
bool
import only the indicated or all parts of speech
pycross.dbapi.HunspellImportTask.remove_hyphens |
bool
remove all hyphens from words
pycross.dbapi.HunspellImportTask.replacements |
dict
character replacement rules
pycross.dbapi.HunspellImportTask.rows |
2-tuple
| None
the start and end rows (indices) of the words to import
pycross.dbapi.HunspellImportTask.signals |
HunspellImportSignals
signals emiited by the import task