pipelines package

Submodules

pipelines.dataframe_joiner module

Module toJoin upstream dataframes

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

class pipelines.dataframe_joiner.DataFrameJoiner(configuration, instance_name)

Bases: primrose.base.pipeline.AbstractPipeline

Join upstream dataframes

init_pipeline()

create the pipeline’s TransformerSequence

Returns

a TransformerSequence

static necessary_config(node_config)

Return the necessary configuration keys for the DataFrameJoiner object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

start_table: first table index in alpha order which defines who is eligible for this analysis (all other tables will be left joined to this) join_key: list of column names to join dataframes from different readers on

Returns

set of keys

transform(data_object)

Get DataFrames from the data object then put them into a alphabetical list for constant join order

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

pipelines.encode_train_test_split module

“Encode all string df columns to numeric labels, inherit test/train split from parent

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class pipelines.encode_train_test_split.EncodeTrainTestSplit(configuration, instance_name)

Bases: primrose.pipelines.train_test_split.TrainTestSplit

Encode all string df columns to numeric labels, inherit test/train split from parent

final_data_object_additions(data_object)

Overload function which adds the label encoder after running fit_transform or transform

Returns

instance of DataObject

Return type

data_object (DataObject)

property first_transformer_in_sequence

returns 1st transformer in a sequence

Returns

a transformer

init_pipeline()

create the pipeline’s TransformerSequence

Returns

a TransformerSequence

pipelines.sklearn_preprocessing_pipeline module

Module to run preprocessing using SKlearn preprocessors

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class pipelines.sklearn_preprocessing_pipeline.SklearnPreprocessingPipeline(configuration, instance_name)

Bases: primrose.pipelines.train_test_split.TrainTestSplit

init_pipeline()

create the pipeline’s TransformerSequence

Returns

a TransformerSequence

static necessary_config(node_config)

Return the necessary configuration keys for the SklearnPreprocessingPipeline object

Returns

set of keys

pipelines.train_test_split module

Module to run train test splits

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com) Carl Anderson (carl.anderson@weightwatchers.com)

class pipelines.train_test_split.TrainTestSplit(configuration, instance_name)

Bases: primrose.base.pipeline.AbstractPipeline

Parent pipeline to split into training / testing data, and run a transform

Note

This class will split data into testing and training portions, then write the objects to the data_object if this is the only the pipeline operation needed, then you can use this class directly as a pipeline. Otherwise, make a child class and write an init_pipeline method to perform operations on your data.

This class also handles writing your transformer_sequence into the data_object, so there’s no need to write in child classes.

features(data)

Use user-specified features if available, otherwise use all non-target columns

Parameters

data (DataFrame) – pandas dataframe

Returns

lsit of feature names

final_data_object_additions(data_object)

DataObject: Template method to be overloaded in child classes

Note

Allows for child class TransformerSequence objects to be added into the data_object

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

fit_transform(data_object)

Split data into testing and training sets, then applies the categorical transform to each

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

static necessary_config(node_config)

Return the necessary configuration keys for the DataFrameJoiner object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

target_variable (string): column name which holds the target variable training_fraction (float): 0->1 float for the fraction of data rows to be used for training seed (int): random number to control the stochastic row selection

Returns

set of keys

transform(data_object)

Transform the data into label encoded data using the pre-trained transformer sequence

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

Module contents