primrose.base package

Submodules

primrose.base.conditional_path_node module

Module with abstract pipeline class to specify interface needed for future pipelines

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.conditional_path_node.AbstractConditionalPath(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

A class that supports conditional pathing through the DAG. After running, prune() can provide a list of destination nodes, representing start of paths to prune. DAGRunner can then prune those nodes, and all paths downstream of those nodes, from the DAG

all_nodes_to_prune()

What are all the nodes we should prune from the DAG?

Note

this call destinations_to_prune() and then uses the DAG to identify the complete subgraphs starting with those destinations

Returns

set of all nodes to prune

abstract destinations_to_prune()

Which destinations, if any, should we prune from DAG?

Returns

list of destinations nodes to prune from DAG, None otherwise

primrose.base.model module

Module with abstract model class to specify interface needed for future models

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.model.AbstractModel(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

Model class should be able to train, evaluate or predict

abstract eval_model(data_object)

Evaluate a previously trained model performance using labeled data

Method should be able to load a serialized model if necessary (load_model method), or work from a model trained in-scope with the train_model method. This method should calculate and store some aspect of model error into attributes that are serialized with the save_model method

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

abstract static necessary_config(node_config)

Return a list of necessary configuration keys within the implementation

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

After adding this list, validation automatically occurs before instantiation in the pipeline factory.

Returns

set of keys necessary to run implementation

abstract predict(data_object)

Make predictions using a pre-trained model on the features in the feature_df dataframe

Using a pre-trained model (either from an in-scope run of train_model or a call to the load_model method) this method should append predictions to each row of features in the input, feature_df. This method may also add feature importance columns, for importance analysis

Parameters

data_object – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

run(data_object)

run the model

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

abstract train_model(data_object)

Train an internal model attribute, then save the trained model with the save_model method

Using the training features in feature_df (pandas dataframe object) and the target_variable (pandas series), cross-validation or other training scripts will happen in theis method, according to the parameters in parameters

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

primrose.base.node module

Top level notion of a node in the graph

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.node.AbstractNode(configuration, instance_name)

Bases: abc.ABC

abstract static necessary_config(node_config)

Return a list of necessary configuration keys within the implementation

Parameters

node_config (dict) – set of parameters / attributes for the node

After adding this list, validation automatically occurs before instantiation in the configuration

Returns

set of keys necessary to run implementation

abstract run(data_object)

run the node. For a reader, that means read, for a writer that means write etc.

Parameters

data_object (DataObject) – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

primrose.base.pipeline module

Module with abstract pipeline class to specify interface needed for future pipelines

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.pipeline.AbstractPipeline(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

Pipeline class should have a defined pipeline that it executes and the ability to transform raw data

check_for_upstream_transformers(data_object)

Examine the upstream data_object for any TransformerSequence

if not found, then initialize a new TransformerSequence

Parameters

data_object (DataObject) – istance of DataObject

Returns

TransformerSequence

execute_pipeline(input_, mode)

Run a TransformerSequence of functions with chained input and output data

Parameters
  • input (object) – input data (usually a pandas dataframe)

  • mode – enum object for fit, transform, or fit_transform

Returns

transformed data (usually a pandas dataframe) after running through all functions in the pipeline

fit_transform(data_object)

Clean/transform or filter data using a pipeline of functions

The method should also cache results and report on sizing for debugging. fit_transform must store the information necessary for data transformations on test data, so any encodings or model-based imputations must be cached in this method, to be called when the transform method is used.

Parameters

data_object (DataObject) – DataObject instance

Returns

DataObject instance

Return type

data_object (DataObject)

init_pipeline()

Initialize the pipeline if no pipeline object is found in the upstream data objects

Returns

TransformerSequence

static necessary_config(node_config)

Return a list of necessary configuration keys within the implementation

Parameters

node_config (dict) – set of parameters / attributes for the node

After adding this list, validation automatically occurs before instantiation in the pipeline factory.

Returns

set of keys necessary to run implementation

run(data_object)

Run pipeline on the data object

Parameters

data_object (DataObject) – instance of DataObject

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

abstract transform(data_object)

Clean/transform or filter data using a pipeline of functions and any cached objects from fit_transform

The method should cache post-transform results and report on sizing for debugging. tranform uses the cached objects scored in the fit_tranform call to transform data. It’s likely to be used for live predictions or test (hold-out) data.

Parameters

data_object (DataObject) – DataObject instance

Returns

data_object (DataObject)

class primrose.base.pipeline.PipelineModeType

Bases: enum.Enum

Mode when performing the pipeline

Note

FIT = fit data to transformer object only TRANSFORM = transform data only from (previously) fit transformers in a pipeline FIT_TRANSFORM = fit data then transform data in a pipeline

FIT = 'FIT'
FIT_TRANSFORM = 'FIT_TRANSFORM'
TRANSFORM = 'TRANSFORM'
names = <function PipelineModeType.names>
values = <function PipelineModeType.values>

primrose.base.postprocess module

Module with abstract postprocess class to specify interface needed for future postprocesses

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.postprocess.AbstractPostprocess(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

Postprocess module which must have an postprocess method to send data to an external source

primrose.base.reader module

A module that reads something from somewhere

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.reader.AbstractReader(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

read some data from somewhere

primrose.base.search_engine module

Abstract base class that performs TFIDF on some corpus

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.search_engine.AbstractSearchEngine(configuration, instance_name)

Bases: primrose.base.model.AbstractModel

an abstract search engine

cosine_similarity_matrix()

compute the cosine similarities between all document pairs in the corpus

Returns

square matrix of cosine similarities where index of matrix is index of corpus IDs

Return type

matrix (numpy)

eval_model(data_object)

evaluate the model

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

static necessary_config(node_config)

Return a list of necessary configuration keys for AbstractSearchEngine

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

id_key: key used in the corpus object for ids doc_key: key used in the corpus object for docs

Returns

set of keys necessary to run AbstractSearchEngine

predict(data_object)

make predictions from this corpus, meaning create cosine similarity matrix

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

abstract tokenize(s)

Given some string, tokenize it

Parameters

s (str) – input string

Returns

list of tokens

train_model(data_object)

train the model which means run fit_transform on a TFIDF model

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

primrose.base.sql_reader module

Abstract reader that uses SQL

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.sql_reader.AbstractSqlReader(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

A reader that explicitly reads from relational DB using SQL and is able to run pd.read_sql.

abstract get_connection()

return a database connection, one that is compatible with pd.read_sql

static necessary_config(node_config)

Return a list of necessary configuration keys within the implementation

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

After adding this list, validation automatically occurs before instantiation in the pipeline factory.

Returns

set of keys necessary to run implementation

query_db(query, conn)

Query the db using Bigquery logic if specified.

Parameters
  • query (str) – SQL query string

  • conn (DB connection) – database connection

Returns

dataframe

Return type

dataframe (DataFrame)

run(data_object)

run SQL queries into pandas dataframes

Parameters

data_object (DataObject) – instance of DataObject

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

primrose.base.success module

Abstract base class for a clean up node, such as signalling success

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.success.AbstractSuccess(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

Ability to cleanup, such as notify success

primrose.base.transformer module

Module with abstract transformer class to specify interface needed for transformers

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.transformer.AbstractTransformer

Bases: abc.ABC

Serializable object that can be string together within a pipeline

abstract fit(data)

User implements fit operation on a single data element from a data_object

Parameters

data (object) – some data

Returns

data

fit_transform(data)

fit then transform data

Parameters

data (object) – input data

Returns

data, transformed

abstract transform(data)

User implements internal transform function which operates on a single data element from a data_object

Parameters

data (object) – input data

Returns

data, transformed

primrose.base.transformer_sequence module

a simple container for a list of transformers

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.transformer_sequence.TransformerSequence(sequence=[])

Bases: object

A container for list of transformers

add(transformer)

add a transformer to the sequence

Returns

nothing. Side effect is to add transformer to the list

Raises

Exception if transformer is not a transformer

transformers()

generate: yield each transformer

Yields

yields a transformer from the list

primrose.base.writer module

A module that writes something somewhere

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.base.writer.AbstractWriter(configuration, instance_name)

Bases: primrose.base.node.AbstractNode

write some data somewhere

Module contents