hifis_surveyval.core package¶
Submodules¶
hifis_surveyval.core.dispatch module¶
This module allows discovery and dispatch of analysis functions.
- class hifis_surveyval.core.dispatch.Dispatcher(surveyval: hifis_surveyval.hifis_surveyval.HIFISSurveyval, data: hifis_surveyval.data_container.DataContainer)[source]¶
Bases:
object
Provides analysis function module and execution facilities.
The operations are based on a module folder and optionally a list of module names to be given at initialization.
- __init__(surveyval: hifis_surveyval.hifis_surveyval.HIFISSurveyval, data: hifis_surveyval.data_container.DataContainer) None [source]¶
Initialize the Dispatcher.
- Args:
- surveyval (HIFISSurveyval): Passing HIFISSurveyval object in in
order to pass it through to particular analysis scripts.
- discover() None [source]¶
Discover all potential or selected modules in the module folder.
Iterate over all modules in the module folder (non-recursive) or selected modules only and cache the names of those python (.py) files. Exception: __init__.py is excluded.
- load_all_modules() None [source]¶
Try to load and run all discovered modules.
Make sure to run discover() beforehand. If no modules have been discovered, a warning will be logged. See Also: load_module()
- load_module(module_name: str) None [source]¶
Attempt to load a module given by name.
Exceptions raised from import will be caught and logged as error on the console.
- Args:
module_name (str): The name of the module, without the .py ending
- Raises:
ImportError: Exception thrown if script could not be loaded. AttributeError: Exception thrown if run method could not be executed.
hifis_surveyval.core.preprocess module¶
This module starts a preprocessing script, if it exists.
- class hifis_surveyval.core.preprocess.Preprocessor[source]¶
Bases:
object
Provides running a preprocessing script.
- classmethod preprocess(settings: hifis_surveyval.core.settings.Settings, data: hifis_surveyval.data_container.DataContainer) hifis_surveyval.data_container.DataContainer [source]¶
Run preprocessing script.
Exceptions raised from import will be caught and logged as error on the console.
- Args:
settings (Settings): The settings of the run. data (DataContainer): The data to preprocess.
- Raises:
ImportError: Exception thrown if script could not be loaded. AttributeError: Exception thrown if run method could not be executed.
hifis_surveyval.core.settings module¶
This module handles settings.
It provides: * settings classes * getter for settings * an export function to create a file
- class hifis_surveyval.core.settings.FileSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, PREPROCESSING_FILENAME: pathlib.Path = PosixPath('preprocess.py'), METADATA: pathlib.Path = PosixPath('metadata'), SCRIPT_FOLDER: pathlib.Path = PosixPath('scripts'), SCRIPT_NAMES: List[str] = [], OUTPUT_FORMAT: hifis_surveyval.plotting.supported_output_format.SupportedOutputFormat = SupportedOutputFormat.SCREEN, OUTPUT_FOLDER: pathlib.Path = PosixPath('output'), ID_COLUMN_NAME: str = 'id', ANONYMOUS_QUESTION_ID: str = '_', HIERARCHY_SEPARATOR: str = '/', DATA_ID_SEPARATOR: str = '_', CUSTOM_PLOT_STYLE: str = '')[source]¶
Bases:
pydantic.env_settings.BaseSettings
Settings, that the user can change.
- ANONYMOUS_QUESTION_ID: str¶
- CUSTOM_PLOT_STYLE: Optional[str]¶
- class Config[source]¶
Bases:
object
Subclass for specification.
See https://pydantic-docs.helpmanual.io/usage/model_config/ for details.
- case_sensitive = True¶
- DATA_ID_SEPARATOR: str¶
- HIERARCHY_SEPARATOR: str¶
- ID_COLUMN_NAME: str¶
- METADATA: pathlib.Path¶
- OUTPUT_FOLDER: pathlib.Path¶
- PREPROCESSING_FILENAME: pathlib.Path¶
- SCRIPT_FOLDER: pathlib.Path¶
- SCRIPT_NAMES: List[str]¶
- classmethod validate_metadata_folder(to_validate: pathlib.Path) pathlib.Path [source]¶
Ensure the metadata folder is a folder and exists.
- Args:
- to_validate:
The path to the metadata folder, which is to be validated.
- Returns:
The path to the metadata folder if it is valid.
- Raises:
- ValueError:
If either the given path was not a folder or did not exist.
- classmethod validate_preprocessing_script(to_validate: pathlib.Path) pathlib.Path [source]¶
Ensure that preprocessing script is a Python file.
- Args:
- to_validate:
Preprocessing script path to be validated.
- Returns:
Path to the preprocessing script.
- Raises:
- ValueError:
If the given script did not end with “.py” and therefore probably is not a python script.
- class hifis_surveyval.core.settings.Settings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, PREPROCESSING_FILENAME: pathlib.Path = PosixPath('preprocess.py'), METADATA: pathlib.Path = PosixPath('metadata'), SCRIPT_FOLDER: pathlib.Path = PosixPath('scripts'), SCRIPT_NAMES: List[str] = [], OUTPUT_FORMAT: hifis_surveyval.plotting.supported_output_format.SupportedOutputFormat = SupportedOutputFormat.SCREEN, OUTPUT_FOLDER: pathlib.Path = PosixPath('output'), ID_COLUMN_NAME: str = 'id', ANONYMOUS_QUESTION_ID: str = '_', HIERARCHY_SEPARATOR: str = '/', DATA_ID_SEPARATOR: str = '_', CUSTOM_PLOT_STYLE: str = '', CONFIG_FILENAME: pathlib.Path = PosixPath('hifis-surveyval.yml'), VERBOSITY: int = 0, RUN_TIMESTAMP: str = None, ANALYSIS_OUTPUT_PATH: pathlib.Path = None, TRUE_VALUES: Set[str] = {'1', 'On', 'True', 'Y', 'Yes'}, FALSE_VALUES: Set[str] = {'0', 'False', 'N', 'No', 'Off'})[source]¶
Bases:
hifis_surveyval.core.settings.SystemSettings
,hifis_surveyval.core.settings.FileSettings
Merge two sub setting types.
- ANALYSIS_OUTPUT_PATH: pathlib.Path¶
- CONFIG_FILENAME: pathlib.Path¶
- FALSE_VALUES: Set[str]¶
A set of strings to be interpreted as boolean ‘False’ when parsing the input data.
- RUN_TIMESTAMP: str¶
- TRUE_VALUES: Set[str]¶
A set of strings to be interpreted as boolean ‘True’ when parsing the input data.
- VERBOSITY: int¶
- class hifis_surveyval.core.settings.SystemSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, CONFIG_FILENAME: pathlib.Path = PosixPath('hifis-surveyval.yml'), VERBOSITY: int = 0, RUN_TIMESTAMP: str = None, ANALYSIS_OUTPUT_PATH: pathlib.Path = None, TRUE_VALUES: Set[str] = {'1', 'On', 'True', 'Y', 'Yes'}, FALSE_VALUES: Set[str] = {'0', 'False', 'N', 'No', 'Off'})[source]¶
Bases:
pydantic.env_settings.BaseSettings
Settings, that are not loaded from file.
- ANALYSIS_OUTPUT_PATH: pathlib.Path¶
- CONFIG_FILENAME: pathlib.Path¶
- FALSE_VALUES: Set[str]¶
A set of strings to be interpreted as boolean ‘False’ when parsing the input data.
- RUN_TIMESTAMP: str¶
- TRUE_VALUES: Set[str]¶
A set of strings to be interpreted as boolean ‘True’ when parsing the input data.
- VERBOSITY: int¶
- classmethod assemble_output_path(to_validate: str, values: Dict[str, Any]) pathlib.Path [source]¶
Assemble path from user settings and datetime.
- Args:
- to_validate (str):
Analysis output path as string to be validated.
- values (Dict[str, Any]):
Parts of the analysis output path to be concatenated as an absolute path.
- Returns:
Path: Path to the output folder of the an analysis run.
hifis_surveyval.core.util module¶
This module provides helper functions.
- hifis_surveyval.core.util.create_custom_plot_style_template() None [source]¶
Create Matplotlib custom plot style file template.
- hifis_surveyval.core.util.create_example_script(settings: hifis_surveyval.core.settings.Settings) None [source]¶
Create an example script from data payload at the default script location.
- Args:
- settings (Settings):
Settings of the analysis run.
- hifis_surveyval.core.util.create_preprocessing_script(settings: hifis_surveyval.core.settings.Settings) None [source]¶
Create an empty preprocessing script at the default location.
- Args:
- settings (Settings):
Settings of the analysis run.
- hifis_surveyval.core.util.cross_reference_sum(data: <MagicMock id='140501836847952'>, grouping: <MagicMock id='140501833918448'>) <MagicMock id=’140501836847952’> [source]¶
Cross references a data frame with a series and count correlations.
The data frame is processed column-wise. For each column, indices are grouped up by their respective value in the grouping series and each group is summed up.
Columns with incomplete data or rows that can not be cross-referenced may be dropped.
In the context of the survey analysis, data usually is a multiple choice question, while the grouping series is a single choice question. They get matched by the participant IDs and the correlations get summed up.
- Args:
- data (DataFrame):
A data frame of which the columns are to be grouped and summed up.
- grouping (Series):
A series with indices (mostly) matching that of “data”, associating each index with a group towards which the values of “data” are to be counted.
- Returns:
- DataFrame:
A data frame containing the columns from data (minus dropped columns) and the unique values of the grouping series as indices. Each cell at [column, index] holds the sum of the values in the respective column of the data which corresponded to the index in the grouping series.
- hifis_surveyval.core.util.dataframe_value_counts(dataframe: <MagicMock id='140501836847952'>, relative_values: bool = False, drop_nans: bool = True) <MagicMock id=’140501836847952’> [source]¶
Count how often a unique value appears in each column of a data frame.
- Args:
- dataframe (DataFrame):
The data frame of which the values shall be counted.
- relative_values (bool):
Instead of absolute counts fill the cells with their relative contribution to the column total
- drop_nans (bool):
Whether to remove the NaN value count. Defaults to True
- Returns:
- DataFrame:
A new data frame with the same columns as the input. The index is changed to represent the unique values and the cells contain the count of the unique values in the given column.
- hifis_surveyval.core.util.filter_and_group_series(base_data: <MagicMock id='140501833918448'>, group_by: <MagicMock id='140501833918448'>, min_value: Optional[float] = None, max_value: Optional[float] = None) <MagicMock id=’140501836847952’> [source]¶
Filter a series and group its values according to another series.
Generate a sparse DataFrame in which all values of base_data are assigned to a column according to the corresponding value for the same index in group_by.
Indexes not present in group_by will result in an empty row. Indexes not present in base_data will result in an empty column.
- Args:
- base_data (Series):
The series of which the data is to be sorted and filtered.
- group_by (Series):
A series assigning each index to a group.
- min_value (Optional[float]):
An optional minimum value. All values of base_data below this value will be excluded from the result. Not set by default.
- max_value (Optional[float]):
An optional maximum value. All values of base_data above this value will be excluded from the result. Not set by default.
- Returns:
- DataFrame:
A new DataFrame where each row represents an index of base_data and each column is one of the unique values of the group_by series. The values of base_data are put into the column where the base_data index matches the group_by index.
Module contents¶
This package provides core functionalities.