transformers package

Submodules

transformers.categoricals module

Module to run a basic decision tree model

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class transformers.categoricals.ExplicitCategoricalTransform(categoricals)

Bases: primrose.base.transformer.AbstractTransformer

DEFAULT_NUMERIC = -9999
fit(data)

User implements fit operation on a single data element from a data_object

Parameters

data (object) – some data

Returns

data

transform(data)

Transform categorical variables into one or more numeric ones, no need to separate testing & training data

Parameters

data – dictionary containing dataframe with all categorical columns present

Returns

data with all categorical columns recoded and/or deleted

class transformers.categoricals.ImplicitCategoricalTransform(target_variable)

Bases: primrose.base.transformer.AbstractTransformer

Class which implicitly transforms all string columns of a dataframe with sklearn LabelEncoder

fit(data)

encode the data as categorical labels

Parameters

data (dataframe) –

Returns

dataframe (dataframe)

transform(data)

Transform data into categorical variables using pre-trained label encoder

Parameters

data (dataframe) –

Returns

dataframe (dataframe)

transformers.combine module

Custom combiner object which works via pandas left merges

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class transformers.combine.LeftJoinDataCombiner(join_key)

Bases: primrose.base.transformer.AbstractTransformer

combine two dataframes doing a left join

fit(data)

fit the data, here doing nothing

Parameters

data (list) – list of pandas data frames

Returns

nothing

transform(data)

Applies left joins to combine dataframes from different readers

Parameters

data (list) – list of dataframes

Returns

dataframe

transformers.combine.left_merge_dataframe_on_validated_join_keys(left_df, right_df, join_keys)

Merge two dataframes together or return just the left_df if the right df is None

Parameters
  • left_df – valid dataframe with join keys of matching data type (will be validated)

  • right_df – None or valid dataframe with join keys of matching data type (will be validated)

  • join_keys – list of keys to be validated on datatype and existance in left/right

Returns

joined dataframe object

transformers.filter module

Transform data by filtering in data using filtering operations

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class transformers.filter.FilterByPandasExpression(feature_filters)

Bases: primrose.base.transformer.AbstractTransformer

Applies filters to data as defined in feature_filters

fit(data)

fit data, here just passing

Parameters

data (object) – some data

transform(data)

Applies filters to data as defined in feature_filters. This is neccessary so we can filter rows in one reader based on information from another, and therefore has to be applied after the combiner step.

The filters can operate on a single column with a fixed set of operations and a static value:

fixed operations: ==, !=, <>, <, <=, >, >=

The feature_filters object should be structured as a list of lists:

feature_filters: [[column, operation, static value], [column, operation, static value]]

example: [[“number_of_members”, “<”, 1000]] for filtering all rows with number_of_members less than 1000

Parameters
  • data (dict) – dictionary with dataframes from all readers

  • data_key (str) – key to pull the dataframe from within the data object

  • feature_filters (list) – list of lists with columns and operators to filter on

  • instance_name (str) – name of this pipeline instance

Returns

dataframe with filtered data

Raises

Exception if not a pandas dataframe, operation not supported, or column name not recognized

transformers.impute module

Custom imputer to replace values within pandas dataframe columns with config-specified imputation schemes

Author(s):

Reka Daniel-Weiner (reka.danielweiner@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class transformers.impute.ColumnSpecificImpute(columns_to_zero, columns_to_mean, columns_to_median, columns_to_mode, columns_to_infinity, columns_to_neg_infinity)

Bases: primrose.base.transformer.AbstractTransformer

Transform config specified columns NULL values into zero, mean, median, mode, inf or negative inf

fit(data)

Fit encoder imputation values to dataframe metrics Create a dictionary of column name to imputed values. These values might be straight constants or might be a function of the complete column’s value, such as mode or median

Parameters

data (pandas data frame) – a data frame

Returns

Nothing. Updates internal dictionary of column name to imputed value

Raises

Exception if a column appears in multiple lists, or if column not recognized

transform(data)

Impute columns in data according to the imputations fit by self.fit

Parameters

data (dataframe) –

Returns

data (dataframe)

Raises

Exception if train is not called before transfoorm

transformers.sklearn_preprocessing_transformer module

primrose wrapper around sklearn preprocessor

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class transformers.sklearn_preprocessing_transformer.SklearnPreprocessingTransformer(preprocessor, columns)

Bases: primrose.base.transformer.AbstractTransformer

fit(data)

User implements fit operation on a single data element from a data_object

Parameters

data (object) – some data

Returns

data

fit_transform(data)

fit then transform data

Parameters

data (data) – input data

Returns

data, transformed

transform(data)

User implements internal transform function which operates on a single data element from a data_object

Parameters

data (object) – input data

Returns

data, transformed

transformers.strings module

Transformer that wraps around pandas.Series.str methods

Author(s):

Brian Graham (brian.graham@weightwatchers.com)

class transformers.strings.StringTransformer(method, columns, *args, **kwargs)

Bases: primrose.base.transformer.AbstractTransformer

Transforms Series of strings in a Series or DataFrame.

fit(df)

fit data, here just passing.

Parameters

data (object) – some data

transform(df)

Applies string operation on dataframe columns.

Parameters

df (pd.DataFrame) – pandas dataframe

Returns

pandas dataframe

Return type

df (pd.DataFrame)

Module contents