primrose package¶
Subpackages¶
- primrose.base package
- Submodules
- primrose.base.conditional_path_node module
- primrose.base.model module
- primrose.base.node module
- primrose.base.pipeline module
- primrose.base.postprocess module
- primrose.base.reader module
- primrose.base.search_engine module
- primrose.base.sql_reader module
- primrose.base.success module
- primrose.base.transformer module
- primrose.base.transformer_sequence module
- primrose.base.writer module
- Module contents
- primrose.cleanup package
- primrose.conditionalpath package
- primrose.configuration package
- primrose.dag package
- primrose.dataviz package
- primrose.models package
- primrose.pipelines package
- primrose.readers package
- Submodules
- primrose.readers.csv_reader module
- primrose.readers.database_helper module
- primrose.readers.deserializer module
- primrose.readers.dill_reader module
- primrose.readers.gcs_dill_reader module
- primrose.readers.mysql_helper module
- primrose.readers.mysql_reader module
- primrose.readers.oracle_reader module
- primrose.readers.postgres_helper module
- primrose.readers.postgres_reader module
- primrose.readers.redshift_reader module
- primrose.readers.sklearn_dataset_reader module
- primrose.readers.sqlite_reader module
- Module contents
- primrose.templates package
- primrose.transformers package
- primrose.writers package
Submodules¶
primrose.dag_runner module¶
Run the DAG: gets list of nodes to traverse then calls run(data_object) on each
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.dag_runner.
DagRunner
(configuration)¶ Bases:
object
class that runs the DAG: gets the list of nodes to traverse and then asks them to run
-
cache_data_object
(data_object)¶ cache the data object
- Parameters
data_object (DataObject) – instance of DataObject
- Returns
whether it was cached (bool)
-
check_for_upstream
(sequence)¶ check for any upstream paths with the input sequence. That is, suppose we had a reader flowing to writer. It would not make sense to run writer before the reader.
- Parameters
sequence (list) – list of node names
- Raises
Exception if any upstream paths found –
-
create_data_object
()¶ restore data_object from cache
- Returns
data_object (DataObject)
-
filter_sequence
(sequence)¶ The user may have specified some subset of sections to run in metadata.section_run Let’s assume we can’t trust traverser to limit themselves to those sections, so here we limit the sequence, if necessary
- Parameters
sequence (list) – list of nodes to run in given order
- Returns
complete or subset of input sequence
- Return type
sequence (list)
- Raises
Exception if there are dupes in the sequence, or if nodes are not in config, or we have nodes from other sections. –
The latter can happen if we mix up nodes from sections. That is, suppose we have section1 (1 node) and section 2 (2 nodes) and –
we want to run section2 and then section1 and we receive sequence [section2_node1, section1_node1, section2_node2], it will –
complain about the partition [section2_node1, section1_node1] [section2_node2] as they are mixed from sections. –
-
initial_check_sequence
(sequence)¶ Some checks on the incoming sequence
- Parameters
sequence (list) – list of nodes to run in given order
- Returns
nothing
- Raises
Exception if there are dupes in the sequence, or if nodes are not in config, or we have nodes from other sections. –
The latter can happen if we mix up nodes from sections. That is, suppose we have section1 (1 node) and section 2 (2 nodes) and –
we want to run section2 and then section1 and we receive sequence [section2_node1, section1_node1, section2_node2], it will –
complain about the partition [section2_node1, section1_node1] [section2_node2] as they are mixed from sections. –
-
run
(dry_run=False)¶ - run the whole DAG. Optonally, you can call dry_run=True
which will log what would be run and in what order but not actually run it
- Parameters
dry_run – Boolean. Want to do a dry run?
- Returns
DataObject instance node (Node): last node run
- Return type
data_object
-
primrose.data_object module¶
Module to handle book keeping of data
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.data_object.
DataObject
(config)¶ Bases:
object
DataObject: a container for “data” (strings, dicts, arbitrary objects etc)
-
DATA_KEY
= 'data'¶
-
DEFAULT_RESPONSE_TYPE
= 'kv'¶
-
__repr__
()¶ string representation of the class
- Returns
string representation
-
add
(requestor, data, key='data', overwrite=False)¶ for requestor’s instance_name, set key:data in storage
- Parameters
requestor (Node) – is object (model, pipeline, writer etc) that has instance_name attribute
data (object) – some object
key (string) – if not supplied default data key is used
- Returns
nothing.
-
get
(instance_name, pop_data=False, rtype='kv')¶ get some data from storage, optionally popping it off.
- Parameters
instance_name (str) – name of node in DAG
pop_data (bool) – boolean, whether to pop data from storage
rtype (DataObjectResponseType) – DataObjectResponseType value, specifying response type
- Returns
data of desired DataObjectResponseType, selected with rtype
- Raises
Exception if unrecognixzed rtype or keys –
-
get_filtered_upstream_data
(instance_name, filter_for_key)¶ Upstream data where first level dict keys are first checked for the presence of a filter key
- Parameters
instance_name (str) – name of instance to look upstream from
filter_for_key (str) – the key data was saved with (not instance name but the data value key)
- Returns
dictionary of stored data for that instance if only one matching dict, if more than one valid dictionary then return list of dicts, None otherwise
-
get_upstream_data
(instance_name, pop_data=False, rtype='kv', operation_type_filter=None)¶ Return data from upstream source(s), choose to pop or not from the dict
Note
returns dictionary, where keys are instance_names and each value is a dictionary. However, if i) there is only 1 upstream key and ii) value_only=True then return the value only.
This option is useful if you expect 1 upstream source only and it returns a single artifact, such as a single dataframe. In that case just the dataframe is returned
- Returns
object (type depends on DEFAULT_RESPONSE_TYPE)
- Raises
Exception if no upstream data found –
-
static
read_from_cache
(filename)¶ restore DatObject from dill-cached file
- Parameters
filename (str) – cache filename
- Returns
DataObject instance from cache
- Return type
data_object (DataObject)
-
upstream_keys
(instance_name, operation_type_filter=None)¶ get list of upstream node names for a given input requestor node
- Parameters
instance_name (str) – name of requestor
operation_type_filter (optional) – type of operation type to filter in
- Returns
list of keys, if any
-
write_to_cache
(filename)¶ write data_object (self) to dill-cache
- Returns
nothing. Side effect is to cache object to file
-
-
class
primrose.data_object.
DataObjectResponseType
¶ Bases:
enum.Enum
Type of object when getting data from DataObject
- INSTANCE_KEY_VALUE = dictionary of instance_name keys and their data dictionaries:
{‘instance_name’: {‘key’:value}, ‘instance_name2’: {‘key2’:value2}, … } e.g. {‘corpus_reader’: {‘data’: dataframe}} this is useful if there is a set of upstream data arriving from mulitple sources
- KEY_VALUE = dictonary of data for given instance name: {‘key’:value}
e.g. {‘data’: dataframe} or {‘data’: dataframe, ‘query’: ‘select * from table’} this is useful if there are multiple keys for a given instance_name or if you want to explicitly check against expected keys
- VALUE = value only (for 1st or only instance name and for only key):
e.g. dataframe this is useful if you know a node in DAG has only a single upsteam source and only a single value. Readers are often a good example as they typically read in and provide a single data frame
-
INSTANCE_KEY_VALUE
= 'ikv'¶
-
KEY_VALUE
= 'kv'¶
-
VALUE
= 'v'¶
-
values
= <function DataObjectResponseType.values>¶
primrose.node_factory module¶
Singleton Factory where one can register objects/classes for instantiation
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.node_factory.
NodeFactory
¶ Bases:
object
Singleton Factory where one can register objects/classes for instantiation
-
CLASS_KEY
= 'class'¶
-
CLASS_PREFIX
= 'class_prefix'¶
-
__getattr__
(name)¶ getattr with instance name
- Parameters
name (str) – name of the instance
- Returns
gettattr
-
instance
= None¶
-
primrose.util module¶
Enum to speacify type of run mode: train, predict, and eval
- Author(s):
Michael Skarlinski (michael.skarlinski@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)