pycrossword  0.4
Pure-Python implementation of a crossword puzzle generator and editor
Public Member Functions | Public Attributes | List of all members
pycross.dbapi.HunspellImport Class Reference

Main interface to handle downloads and imports of Hunspell dictionaries as SQLite databases. More...

Public Member Functions

def __init__ (self, settings, dbmanager=None, dicfolder=DICFOLDER)
 
def pool_running (self)
 Checks if there are tasks running in the pool. More...
 
def pool_threadcount (self)
 Gets the number of active threads (tasks) in the pool. More...
 
def pool_wait (self)
 Waits for all the tasks in the pool to complete. More...
 
def get_installed_info (self, lang)
 Gets the information about an existing SQLite database: full path and number of words. More...
 
def list_hunspell (self, stopcheck=None)
 Retrieves the list of Hunspell dictionaries available for download from the public Github repo. More...
 
def list_all_dics (self, stopcheck=None)
 Retrieves the information for all available Hunspell dictionaries. More...
 
def download_hunspell (self, url, lang, overwrite=True, on_stopcheck=None, on_start=None, on_getfilesize=None, on_progress=None, on_complete=None, on_error=None, wait=False)
 Downloads a single Hunspell dictionary (*.dic file) and stores it locally. More...
 
def download_hunspell_all (self, dics, on_stopcheck=None, on_start=None, on_getfilesize=None, on_progress=None, on_complete=None, on_error=None)
 Downloads all the Hunspell dictionaries specified by the user. More...
 
def add_from_hunspell (self, lang, posrules=None, posrules_strict=False, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_checkstop=None, on_start=None, on_word=None, on_commit=None, on_finish=None, on_error=None, wait=False)
 Imports a Hunspell-formatted dictionary file into the DB. More...
 
def add_all_from_hunspell (self, dics, posrules=None, posrules_strict=True, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_stopcheck=None, on_start=None, on_word=None, on_commit=None, on_finish=None, on_error=None)
 Imports all Hunspell dictionaries specified by the user. More...
 

Public Attributes

 settings
 dict pointer to the app global settings (utils::guisettings::CWSettings::settings) More...
 
 db
 Sqlitedb | None DB object More...
 
 dicfolder
 str root path of the dictionaries, default = utils::globalvars::DICFOLDER More...
 
 pool
 QtCore.QThreadPool thread pool to run tasks More...
 
 timeout_
 int timeout for HTTP(S) requests (in milliseconds) More...
 
 proxies_
 dict HTTP(S) proxy server settings More...
 

Detailed Description

Main interface to handle downloads and imports of Hunspell dictionaries as SQLite databases.

Can start download and import tasks both in a synchonous mode (start and wait for completion) and asynchronously (in a thread pool).

Constructor & Destructor Documentation

◆ __init__()

def pycross.dbapi.HunspellImport.__init__ (   self,
  settings,
  dbmanager = None,
  dicfolder = DICFOLDER 
)
Parameters
settingsdict pointer to the app global settings (utils::guisettings::CWSettings::settings)
dbmanagerSqlitedb | None DB object (None to create a new one)
dicfolderstr root path of the dictionaries, default = utils::globalvars::DICFOLDER

Member Function Documentation

◆ add_all_from_hunspell()

def pycross.dbapi.HunspellImport.add_all_from_hunspell (   self,
  dics,
  posrules = None,
  posrules_strict = True,
  posdelim = '/',
  lcase = True,
  replacements = None,
  remove_hyphens = True,
  filter_out = None,
  rows = None,
  commit_each = 1000,
  on_stopcheck = None,
  on_start = None,
  on_word = None,
  on_commit = None,
  on_finish = None,
  on_error = None 
)

Imports all Hunspell dictionaries specified by the user.

The import tasks are started asynchronously in the thread pool, each task using HunspelImportTask::signals to signalize its status and check for interruption request.

Parameters
dicslist list of dict objects each representing a single Hunspell dictionary, its URL, langugage, etc. See list_hunspell() for dict structure. See other parameters in add_from_hunspell()

◆ add_from_hunspell()

def pycross.dbapi.HunspellImport.add_from_hunspell (   self,
  lang,
  posrules = None,
  posrules_strict = False,
  posdelim = '/',
  lcase = True,
  replacements = None,
  remove_hyphens = True,
  filter_out = None,
  rows = None,
  commit_each = 1000,
  on_checkstop = None,
  on_start = None,
  on_word = None,
  on_commit = None,
  on_finish = None,
  on_error = None,
  wait = False 
)

Imports a Hunspell-formatted dictionary file into the DB.

Parameters
langstr short name of the imported dictionary language, e.g. 'en', 'de' etc.
posrulesdict part-of-speech regular expression parsing rules in the format:
{'N': 'regex for nouns', 'V': 'regex for verb', ...}
     Possible keys are: 'N' [noun], 'V' [verb], 'ADV' [adverb], 'ADJ' [adjective],
     'P' [participle], 'PRON' [pronoun], 'I' [interjection],
     'C' [conjuction], 'PREP' [preposition], 'PROP' [proposition],
     'MISC' [miscellaneous / other], 'NONE' [no POS]
 
posrules_strictbool if True (default), only the parts of speech present in posrules dict will be imported [all other words will be skipped]. If False, such words will be imported with 'MISC' and 'NONE' POS markers.
posdelimstr delimiter delimiting the word and its part of speech [default = '/']
lcasebool if True (default), found words will be imported in lower case; otherwise, the original case will remain
replacementsdict character replacement rules in the format:
{'char_from': 'char_to', ...}
Default = None (no replacements)
remove_hyphensbool if True (default), all hyphens ['-'] will be removed from the words
filter_outdict regex-based rules to filter out [exclude] words in the format:
{'word': ['regex1', 'regex2', ...], 'pos': ['regex1', 'regex2', ...]}
These words will not be imported. One of the POS rules can be used to screen off specific parts of speech. Match rules for words will be applied AFTER replacements and in the sequential order of the regex list. Default = None (no filter rules apply).
rows2-tuple | None the start and end rows (indices) of the words to import; e.g. (20, 100) means start import from row 20 and end import after row 100. If the second element in the tuple is negative (e.g. -1), only the start row will be considered and the import will go on till the last word in the source DIC file. None means ALL available words.
commit_eachint threshold of insert operations after which the transaction will be committed (default = 1000)
on_checkstopcallback callback function called periodically to check for interrupt condition; takes 3 parameters:
  • id int unique ID of this task (in the thread pool)
  • lang str short name of the language, e.g. 'en'
  • filepath str full path to the source DIC file Must return a Boolean value: True to stop the import task, False to continue
on_startcallback Qt slot (callback) for HunspellImportSignals::sigStart
on_wordcallback Qt slot (callback) for HunspellImportSignals::sigWordWritten
on_commitcallback Qt slot (callback) for HunspellImportSignals::sigCommit
on_finishcallback Qt slot (callback) for HunspellImportSignals::sigComplete
on_errorcallback Qt slot (callback) for HunspellImportSignals::sigError
waitbool True to wait for the task to complete; False to start the task asynchronously (without waiting for the result)

◆ download_hunspell()

def pycross.dbapi.HunspellImport.download_hunspell (   self,
  url,
  lang,
  overwrite = True,
  on_stopcheck = None,
  on_start = None,
  on_getfilesize = None,
  on_progress = None,
  on_complete = None,
  on_error = None,
  wait = False 
)

Downloads a single Hunspell dictionary (*.dic file) and stores it locally.

Parameters
urlstr URL of the DIC file to download (generally, https://raw.githubusercontent.com/wooorm/dictionaries/main/dictionaries/<LANG>/index.dic)
langstr short name of the language, e.g. 'en'
overwritebool whether to overwrite the existing file (if any)
on_stopcheckcallback callback function called periodically to check for interrupt condition; takes 4 parameters:
  • id int unique ID of this task (in the thread pool)
  • url str URL of the DIC file to download
  • lang str short name of the language, e.g. 'en'
  • filepath str full path to the downloaded (target) file
on_startcallback Qt slot (callback) for HunspellDownloadSignals::sigStart
on_getfilesizecallback Qt slot (callback) for HunspellDownloadSignals::sigGetFilesize
on_progresscallback Qt slot (callback) for HunspellDownloadSignals::sigProgress
on_completecallback Qt slot (callback) for HunspellDownloadSignals::sigComplete
on_errorcallback Qt slot (callback) for HunspellDownloadSignals::sigError
waitbool True to wait for the task to complete; False to start the task asynchronously (without waiting for the result)

◆ download_hunspell_all()

def pycross.dbapi.HunspellImport.download_hunspell_all (   self,
  dics,
  on_stopcheck = None,
  on_start = None,
  on_getfilesize = None,
  on_progress = None,
  on_complete = None,
  on_error = None 
)

Downloads all the Hunspell dictionaries specified by the user.

The download tasks are started asynchronously in the thread pool, each task using HunspellDownloadTask::signals to signalize its status and check for interruption request.

Parameters
dicslist list of dict objects each representing a single Hunspell dictionary, its URL, langugage, etc. See list_hunspell() for dict structure. See other parameters in download_hunspell()

◆ get_installed_info()

def pycross.dbapi.HunspellImport.get_installed_info (   self,
  lang 
)

Gets the information about an existing SQLite database: full path and number of words.

Parameters
langstr short name of the language, e.g. 'en'
Returns
dict info in the format:
{'entries': number_of_entries, 'path': full_path_to_DB_file}

◆ list_all_dics()

def pycross.dbapi.HunspellImport.list_all_dics (   self,
  stopcheck = None 
)

Retrieves the information for all available Hunspell dictionaries.

Does everything what list_hunspell() does, but adds DB information (number of entries and path to DB file) to each dictionary in the list.

Parameters
stopcheckcallback callback that returns True to stop the operation or False to continue (takes no parameters)
Returns
list list of dictionaries representing language-specific dictionary info:
   * 'dic_url': URL of the dictionary file
   * 'lang': short language name, e.g. 'en' / 'ru' / 'it'
   * 'lang_full': full language name, e.g. 'Russian', 'English (US)'
   * 'license': name of applicable license, e.g. 'GPL-3.0' / 'MIT and BSD'
   * 'license_url': URL of applicable license file
   * 'entries': number of entries in the existing DB (0 if no DB exists or is empty)
   * 'path': full path to the existing DB file (empty string if no DB exists)
 

◆ list_hunspell()

def pycross.dbapi.HunspellImport.list_hunspell (   self,
  stopcheck = None 
)

Retrieves the list of Hunspell dictionaries available for download from the public Github repo.

Parameters
stopcheckcallback callback that returns True to stop the operation or False to continue (takes no parameters)
Returns
list list of dictionaries representing language-specific dictionary info:
   * 'dic_url': URL of the dictionary file
   * 'lang': short language name, e.g. 'en' / 'ru' / 'it'
   * 'lang_full': full language name, e.g. 'Russian', 'English (US)'
   * 'license': name of applicable license, e.g. 'GPL-3.0' / 'MIT and BSD'
   * 'license_url': URL of applicable license file
 

◆ pool_running()

def pycross.dbapi.HunspellImport.pool_running (   self)

Checks if there are tasks running in the pool.

Returns
bool True if there are active tasks, False if none

◆ pool_threadcount()

def pycross.dbapi.HunspellImport.pool_threadcount (   self)

Gets the number of active threads (tasks) in the pool.

Returns
int number of active tasks

◆ pool_wait()

def pycross.dbapi.HunspellImport.pool_wait (   self)

Waits for all the tasks in the pool to complete.

Member Data Documentation

◆ db

pycross.dbapi.HunspellImport.db

Sqlitedb | None DB object

◆ dicfolder

pycross.dbapi.HunspellImport.dicfolder

str root path of the dictionaries, default = utils::globalvars::DICFOLDER

◆ pool

pycross.dbapi.HunspellImport.pool

QtCore.QThreadPool thread pool to run tasks

◆ proxies_

pycross.dbapi.HunspellImport.proxies_

dict HTTP(S) proxy server settings

◆ settings

pycross.dbapi.HunspellImport.settings

dict pointer to the app global settings (utils::guisettings::CWSettings::settings)

◆ timeout_

pycross.dbapi.HunspellImport.timeout_

int timeout for HTTP(S) requests (in milliseconds)


The documentation for this class was generated from the following file: