primrose.pipelines package¶
Submodules¶
primrose.pipelines.dataframe_joiner module¶
Module toJoin upstream dataframes
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
primrose.pipelines.dataframe_joiner.
DataFrameJoiner
(configuration, instance_name)¶ Bases:
primrose.base.pipeline.AbstractPipeline
Join upstream dataframes
-
init_pipeline
()¶ create the pipeline’s TransformerSequence
- Returns
a TransformerSequence
-
static
necessary_config
(node_config)¶ Return the necessary configuration keys for the DataFrameJoiner object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
start_table: first table index in alpha order which defines who is eligible for this analysis (all other tables will be left joined to this) join_key: list of column names to join dataframes from different readers on
- Returns
set of keys
-
transform
(data_object)¶ Get DataFrames from the data object then put them into a alphabetical list for constant join order
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
primrose.pipelines.encode_train_test_split module¶
“Encode all string df columns to numeric labels, inherit test/train split from parent
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
primrose.pipelines.encode_train_test_split.
EncodeTrainTestSplit
(configuration, instance_name)¶ Bases:
primrose.pipelines.train_test_split.TrainTestSplit
Encode all string df columns to numeric labels, inherit test/train split from parent
-
final_data_object_additions
(data_object)¶ Overload function which adds the label encoder after running fit_transform or transform
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
property
first_transformer_in_sequence
¶ returns 1st transformer in a sequence
- Returns
a transformer
-
init_pipeline
()¶ create the pipeline’s TransformerSequence
- Returns
a TransformerSequence
-
primrose.pipelines.sklearn_preprocessing_pipeline module¶
Module to run preprocessing using SKlearn preprocessors
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.pipelines.sklearn_preprocessing_pipeline.
SklearnPreprocessingPipeline
(configuration, instance_name)¶ Bases:
primrose.pipelines.train_test_split.TrainTestSplit
-
init_pipeline
()¶ create the pipeline’s TransformerSequence
- Returns
a TransformerSequence
-
static
necessary_config
(node_config)¶ Return the necessary configuration keys for the SklearnPreprocessingPipeline object
- Returns
set of keys
-
primrose.pipelines.train_test_split module¶
Module to run train test splits
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com) Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.pipelines.train_test_split.
TrainTestSplit
(configuration, instance_name)¶ Bases:
primrose.base.pipeline.AbstractPipeline
Parent pipeline to split into training / testing data, and run a transform
Note
This class will split data into testing and training portions, then write the objects to the data_object if this is the only the pipeline operation needed, then you can use this class directly as a pipeline. Otherwise, make a child class and write an init_pipeline method to perform operations on your data.
This class also handles writing your transformer_sequence into the data_object, so there’s no need to write in child classes.
-
features
(data)¶ Use user-specified features if available, otherwise use all non-target columns
- Parameters
data (DataFrame) – pandas dataframe
- Returns
lsit of feature names
-
final_data_object_additions
(data_object)¶ DataObject: Template method to be overloaded in child classes
Note
Allows for child class TransformerSequence objects to be added into the data_object
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
fit_transform
(data_object)¶ Split data into testing and training sets, then applies the categorical transform to each
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
static
necessary_config
(node_config)¶ Return the necessary configuration keys for the DataFrameJoiner object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
target_variable (string): column name which holds the target variable training_fraction (float): 0->1 float for the fraction of data rows to be used for training seed (int): random number to control the stochastic row selection
- Returns
set of keys
-
transform
(data_object)¶ Transform the data into label encoded data using the pre-trained transformer sequence
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-