primrose.base package¶
Submodules¶
primrose.base.conditional_path_node module¶
Module with abstract pipeline class to specify interface needed for future pipelines
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.conditional_path_node.
AbstractConditionalPath
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
A class that supports conditional pathing through the DAG. After running, prune() can provide a list of destination nodes, representing start of paths to prune. DAGRunner can then prune those nodes, and all paths downstream of those nodes, from the DAG
-
all_nodes_to_prune
()¶ What are all the nodes we should prune from the DAG?
Note
this call destinations_to_prune() and then uses the DAG to identify the complete subgraphs starting with those destinations
- Returns
set of all nodes to prune
-
abstract
destinations_to_prune
()¶ Which destinations, if any, should we prune from DAG?
- Returns
list of destinations nodes to prune from DAG, None otherwise
-
primrose.base.model module¶
Module with abstract model class to specify interface needed for future models
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.model.
AbstractModel
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
Model class should be able to train, evaluate or predict
-
abstract
eval_model
(data_object)¶ Evaluate a previously trained model performance using labeled data
Method should be able to load a serialized model if necessary (load_model method), or work from a model trained in-scope with the train_model method. This method should calculate and store some aspect of model error into attributes that are serialized with the save_model method
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
abstract static
necessary_config
(node_config)¶ Return a list of necessary configuration keys within the implementation
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
After adding this list, validation automatically occurs before instantiation in the pipeline factory.
- Returns
set of keys necessary to run implementation
-
abstract
predict
(data_object)¶ Make predictions using a pre-trained model on the features in the feature_df dataframe
Using a pre-trained model (either from an in-scope run of train_model or a call to the load_model method) this method should append predictions to each row of features in the input, feature_df. This method may also add feature importance columns, for importance analysis
- Parameters
data_object – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
run
(data_object)¶ run the model
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
abstract
train_model
(data_object)¶ Train an internal model attribute, then save the trained model with the save_model method
Using the training features in feature_df (pandas dataframe object) and the target_variable (pandas series), cross-validation or other training scripts will happen in theis method, according to the parameters in parameters
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
abstract
primrose.base.node module¶
Top level notion of a node in the graph
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.node.
AbstractNode
(configuration, instance_name)¶ Bases:
abc.ABC
-
abstract static
necessary_config
(node_config)¶ Return a list of necessary configuration keys within the implementation
- Parameters
node_config (dict) – set of parameters / attributes for the node
After adding this list, validation automatically occurs before instantiation in the configuration
- Returns
set of keys necessary to run implementation
-
abstract
run
(data_object)¶ run the node. For a reader, that means read, for a writer that means write etc.
- Parameters
data_object (DataObject) – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
abstract static
primrose.base.pipeline module¶
Module with abstract pipeline class to specify interface needed for future pipelines
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.pipeline.
AbstractPipeline
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
Pipeline class should have a defined pipeline that it executes and the ability to transform raw data
-
check_for_upstream_transformers
(data_object)¶ Examine the upstream data_object for any TransformerSequence
if not found, then initialize a new TransformerSequence
- Parameters
data_object (DataObject) – istance of DataObject
- Returns
TransformerSequence
-
execute_pipeline
(input_, mode)¶ Run a TransformerSequence of functions with chained input and output data
- Parameters
input (object) – input data (usually a pandas dataframe)
mode – enum object for fit, transform, or fit_transform
- Returns
transformed data (usually a pandas dataframe) after running through all functions in the pipeline
-
fit_transform
(data_object)¶ Clean/transform or filter data using a pipeline of functions
The method should also cache results and report on sizing for debugging. fit_transform must store the information necessary for data transformations on test data, so any encodings or model-based imputations must be cached in this method, to be called when the transform method is used.
- Parameters
data_object (DataObject) – DataObject instance
- Returns
DataObject instance
- Return type
data_object (DataObject)
-
init_pipeline
()¶ Initialize the pipeline if no pipeline object is found in the upstream data objects
- Returns
TransformerSequence
-
static
necessary_config
(node_config)¶ Return a list of necessary configuration keys within the implementation
- Parameters
node_config (dict) – set of parameters / attributes for the node
After adding this list, validation automatically occurs before instantiation in the pipeline factory.
- Returns
set of keys necessary to run implementation
-
run
(data_object)¶ Run pipeline on the data object
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
abstract
transform
(data_object)¶ Clean/transform or filter data using a pipeline of functions and any cached objects from fit_transform
The method should cache post-transform results and report on sizing for debugging. tranform uses the cached objects scored in the fit_tranform call to transform data. It’s likely to be used for live predictions or test (hold-out) data.
- Parameters
data_object (DataObject) – DataObject instance
- Returns
data_object (DataObject)
-
-
class
primrose.base.pipeline.
PipelineModeType
¶ Bases:
enum.Enum
Mode when performing the pipeline
Note
FIT = fit data to transformer object only TRANSFORM = transform data only from (previously) fit transformers in a pipeline FIT_TRANSFORM = fit data then transform data in a pipeline
-
FIT
= 'FIT'¶
-
FIT_TRANSFORM
= 'FIT_TRANSFORM'¶
-
TRANSFORM
= 'TRANSFORM'¶
-
names
= <function PipelineModeType.names>¶
-
values
= <function PipelineModeType.values>¶
-
primrose.base.postprocess module¶
Module with abstract postprocess class to specify interface needed for future postprocesses
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.postprocess.
AbstractPostprocess
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
Postprocess module which must have an postprocess method to send data to an external source
primrose.base.reader module¶
A module that reads something from somewhere
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.reader.
AbstractReader
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
read some data from somewhere
primrose.base.search_engine module¶
Abstract base class that performs TFIDF on some corpus
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.search_engine.
AbstractSearchEngine
(configuration, instance_name)¶ Bases:
primrose.base.model.AbstractModel
an abstract search engine
-
cosine_similarity_matrix
()¶ compute the cosine similarities between all document pairs in the corpus
- Returns
square matrix of cosine similarities where index of matrix is index of corpus IDs
- Return type
matrix (numpy)
-
eval_model
(data_object)¶ evaluate the model
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
static
necessary_config
(node_config)¶ Return a list of necessary configuration keys for AbstractSearchEngine
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
id_key: key used in the corpus object for ids doc_key: key used in the corpus object for docs
- Returns
set of keys necessary to run AbstractSearchEngine
-
predict
(data_object)¶ make predictions from this corpus, meaning create cosine similarity matrix
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
abstract
tokenize
(s)¶ Given some string, tokenize it
- Parameters
s (str) – input string
- Returns
list of tokens
-
train_model
(data_object)¶ train the model which means run fit_transform on a TFIDF model
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
instance of DataObject
- Return type
data_object (DataObject)
-
primrose.base.sql_reader module¶
Abstract reader that uses SQL
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.sql_reader.
AbstractSqlReader
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
A reader that explicitly reads from relational DB using SQL and is able to run pd.read_sql.
-
abstract
get_connection
()¶ return a database connection, one that is compatible with pd.read_sql
-
static
necessary_config
(node_config)¶ Return a list of necessary configuration keys within the implementation
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
After adding this list, validation automatically occurs before instantiation in the pipeline factory.
- Returns
set of keys necessary to run implementation
-
query_db
(query, conn)¶ Query the db using Bigquery logic if specified.
- Parameters
query (str) – SQL query string
conn (DB connection) – database connection
- Returns
dataframe
- Return type
dataframe (DataFrame)
-
run
(data_object)¶ run SQL queries into pandas dataframes
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
abstract
primrose.base.success module¶
Abstract base class for a clean up node, such as signalling success
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.success.
AbstractSuccess
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
Ability to cleanup, such as notify success
primrose.base.transformer module¶
Module with abstract transformer class to specify interface needed for transformers
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.transformer.
AbstractTransformer
¶ Bases:
abc.ABC
Serializable object that can be string together within a pipeline
-
abstract
fit
(data)¶ User implements fit operation on a single data element from a data_object
- Parameters
data (object) – some data
- Returns
data
-
fit_transform
(data)¶ fit then transform data
- Parameters
data (object) – input data
- Returns
data, transformed
-
abstract
transform
(data)¶ User implements internal transform function which operates on a single data element from a data_object
- Parameters
data (object) – input data
- Returns
data, transformed
-
abstract
primrose.base.transformer_sequence module¶
a simple container for a list of transformers
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.transformer_sequence.
TransformerSequence
(sequence=[])¶ Bases:
object
A container for list of transformers
-
add
(transformer)¶ add a transformer to the sequence
- Returns
nothing. Side effect is to add transformer to the list
- Raises
Exception if transformer is not a transformer –
-
transformers
()¶ generate: yield each transformer
- Yields
yields a transformer from the list
-
primrose.base.writer module¶
A module that writes something somewhere
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.base.writer.
AbstractWriter
(configuration, instance_name)¶ Bases:
primrose.base.node.AbstractNode
write some data somewhere