transformers package¶
Submodules¶
transformers.categoricals module¶
Module to run a basic decision tree model
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
transformers.categoricals.
ExplicitCategoricalTransform
(categoricals)¶ Bases:
primrose.base.transformer.AbstractTransformer
-
DEFAULT_NUMERIC
= -9999¶
-
fit
(data)¶ User implements fit operation on a single data element from a data_object
- Parameters
data (object) – some data
- Returns
data
-
transform
(data)¶ Transform categorical variables into one or more numeric ones, no need to separate testing & training data
- Parameters
data – dictionary containing dataframe with all categorical columns present
- Returns
data with all categorical columns recoded and/or deleted
-
-
class
transformers.categoricals.
ImplicitCategoricalTransform
(target_variable)¶ Bases:
primrose.base.transformer.AbstractTransformer
Class which implicitly transforms all string columns of a dataframe with sklearn LabelEncoder
-
fit
(data)¶ encode the data as categorical labels
- Parameters
data (dataframe) –
- Returns
dataframe (dataframe)
-
transform
(data)¶ Transform data into categorical variables using pre-trained label encoder
- Parameters
data (dataframe) –
- Returns
dataframe (dataframe)
-
transformers.combine module¶
Custom combiner object which works via pandas left merges
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
transformers.combine.
LeftJoinDataCombiner
(join_key)¶ Bases:
primrose.base.transformer.AbstractTransformer
combine two dataframes doing a left join
-
fit
(data)¶ fit the data, here doing nothing
- Parameters
data (list) – list of pandas data frames
- Returns
nothing
-
transform
(data)¶ Applies left joins to combine dataframes from different readers
- Parameters
data (list) – list of dataframes
- Returns
dataframe
-
-
transformers.combine.
left_merge_dataframe_on_validated_join_keys
(left_df, right_df, join_keys)¶ Merge two dataframes together or return just the left_df if the right df is None
- Parameters
left_df – valid dataframe with join keys of matching data type (will be validated)
right_df – None or valid dataframe with join keys of matching data type (will be validated)
join_keys – list of keys to be validated on datatype and existance in left/right
- Returns
joined dataframe object
transformers.filter module¶
Transform data by filtering in data using filtering operations
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
transformers.filter.
FilterByPandasExpression
(feature_filters)¶ Bases:
primrose.base.transformer.AbstractTransformer
Applies filters to data as defined in feature_filters
-
fit
(data)¶ fit data, here just passing
- Parameters
data (object) – some data
-
transform
(data)¶ Applies filters to data as defined in feature_filters. This is neccessary so we can filter rows in one reader based on information from another, and therefore has to be applied after the combiner step.
The filters can operate on a single column with a fixed set of operations and a static value:
fixed operations: ==, !=, <>, <, <=, >, >=
The feature_filters object should be structured as a list of lists:
feature_filters: [[column, operation, static value], [column, operation, static value]]
example: [[“number_of_members”, “<”, 1000]] for filtering all rows with number_of_members less than 1000
- Parameters
data (dict) – dictionary with dataframes from all readers
data_key (str) – key to pull the dataframe from within the data object
feature_filters (list) – list of lists with columns and operators to filter on
instance_name (str) – name of this pipeline instance
- Returns
dataframe with filtered data
- Raises
Exception if not a pandas dataframe, operation not supported, or column name not recognized –
-
transformers.impute module¶
Custom imputer to replace values within pandas dataframe columns with config-specified imputation schemes
- Author(s):
Reka Daniel-Weiner (reka.danielweiner@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
transformers.impute.
ColumnSpecificImpute
(columns_to_zero, columns_to_mean, columns_to_median, columns_to_mode, columns_to_infinity, columns_to_neg_infinity)¶ Bases:
primrose.base.transformer.AbstractTransformer
Transform config specified columns NULL values into zero, mean, median, mode, inf or negative inf
-
fit
(data)¶ Fit encoder imputation values to dataframe metrics Create a dictionary of column name to imputed values. These values might be straight constants or might be a function of the complete column’s value, such as mode or median
- Parameters
data (pandas data frame) – a data frame
- Returns
Nothing. Updates internal dictionary of column name to imputed value
- Raises
Exception if a column appears in multiple lists, or if column not recognized –
-
transform
(data)¶ Impute columns in data according to the imputations fit by self.fit
- Parameters
data (dataframe) –
- Returns
data (dataframe)
- Raises
Exception if train is not called before transfoorm –
-
transformers.sklearn_preprocessing_transformer module¶
primrose wrapper around sklearn preprocessor
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
transformers.sklearn_preprocessing_transformer.
SklearnPreprocessingTransformer
(preprocessor, columns)¶ Bases:
primrose.base.transformer.AbstractTransformer
-
fit
(data)¶ User implements fit operation on a single data element from a data_object
- Parameters
data (object) – some data
- Returns
data
-
fit_transform
(data)¶ fit then transform data
- Parameters
data (data) – input data
- Returns
data, transformed
-
transform
(data)¶ User implements internal transform function which operates on a single data element from a data_object
- Parameters
data (object) – input data
- Returns
data, transformed
-
transformers.strings module¶
Transformer that wraps around pandas.Series.str methods
- Author(s):
Brian Graham (brian.graham@weightwatchers.com)
-
class
transformers.strings.
StringTransformer
(method, columns, *args, **kwargs)¶ Bases:
primrose.base.transformer.AbstractTransformer
Transforms Series of strings in a Series or DataFrame.
-
fit
(df)¶ fit data, here just passing.
- Parameters
data (object) – some data
-
transform
(df)¶ Applies string operation on dataframe columns.
- Parameters
df (pd.DataFrame) – pandas dataframe
- Returns
pandas dataframe
- Return type
df (pd.DataFrame)
-