primrose.configuration package

Submodules

primrose.configuration.configuration module

Module to implement a Configuration parser which enhances parsing functionality of configparser

Author(s):

Michael Skarlinski (michael.skarlinski@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.configuration.configuration.Configuration(config_location, is_dict_config=False, dict_config=None)

Bases: object

Stores user defined configuration for primrose job

check_config()

check the configuration as much as we can as early as we can

Raises

various exceptions if any checks fail

check_metadata()

checks some dependencies among metadata keys

Raises

ConfigurationError is issues found

check_sections()

Check that all the sections in implementation are supported ones. Either the user supplied metata.section_registry, or they are using default sections

Raises
  • ConfigurationError if declaring metadata.section_registry and sections from implementation were not found in metadata

  • or vice versa, or if using default operations but sections found that were not supported

config_for_instance(instance_name)

get the configuration for a given node / instance_name

Returns

JSON chunk for this instance

dict_raise_on_duplicates(ordered_pairs)

Reject duplicate keys in JSON string, ie. sections and node names.

Parameters

ordered_pairs (list) – list of key:values from the config Example: ordered_pairs [(‘class’, ‘CsvReader’), (‘filename’, ‘data/tennis.csv’), (‘destinations’, [‘write_output’])] ordered_pairs [(‘read_data’, {‘class’: ‘CsvReader’, ‘filename’: ‘data/tennis.csv’, ‘destinations’: [‘write_output’]})]

Returns

dictionary of key (node type) and value (node name)

Return type

dictionary (dict)

static perform_any_config_fragment_substitution(config_str)

Given some configuration file content string, look for subtitutions given by $$FILE=path/to/config/file/fragment.json$$ and make the replacements using the filenames provided For example: { $$FILE=/tmp/metadata.json$$ “implementation_config”: { $$FILE= config/read_write_fragment.json $$ } } will inject /tmp/metadata.json into the 2nd line of that config.

Parameters

config_str (str) – content of some configuration file that may or may not contain substition variables

Returns

the post-substituted configuration string

Return type

config_str (str)

sections_in_order()

Return list of section names in order, either explicitly from metadata or from default Enum order

Note

If there is a non-empty section_run list in metadata return that elif there is a non-empty section_registry in metadata return that otherwise return sections present from default OperationType enum.

We need this method because the config sections are a dictionary not a list so we can’t guarantee order of keys. This method imposes an expected order.

Returns

tuple containing:

section names (list): list of sections

source (str): where did the list come from? section_run, section_registry, or default?

Return type

(tuple)

primrose.configuration.configuration_dag module

A class that creates a directed acyclic graph (DAG) and perhaps a number of checks, such as detecting cycles, orphans, and unrecognized nodes

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.configuration.configuration_dag.ConfigurationDag(config)

Bases: object

static add_edge(G, G2, node_names, key, destination)

add an edge to the DAG

Parameters
  • G (networkx bidirectional graph) – bidirectional graph instance

  • G2 (networkx directional graph) – bidrectional graph instance

  • key (str) – starting node name

  • destination (str) – destination node name

Returns

nothing. Side effect is to add the edge

check_connected_components()

now we can count the number of connected components. >1 is problem

Raises

ConfigurationError if multiuple connected components

check_dag()

check that it is a DAG

Note

check for cycles check only 1 connected component, no orphans check that all edges point to known nodes

Raises

Excetions if cycles found or multiple connected components

check_for_cycles()

check for cycles

Raise:

ConfigurationError if cycles found

static check_node_exists(node_names, key)

check that some specified destination is node on graph

Parameters
  • nodes_names (list) – list of node names

  • key (str) – name of node to check

Raises

ConfigurationError

create_dag()

Create the DAG

Returns

nothing. Side effect is to set up graphs and node map

descendents(source)

Get the list of descendents from source, i.e. subgraph below source

Parameters

source (str) – name of source

Returns

list of descendents of source node

nodes_of_type(operation_type)

get set of nodes of a given operation type (OperationType.reader, OperationType.writer etc)

Parameters

operation_type (OperationType) – OperationType

Returns

set of keys, if any, of the given operation type

paths(source, target)

return the paths, if any, from a given source node to a given target node

Parameters
  • source (str) – name of node which is starting point of path

  • target (str) – name of node which is end point of path

Returns

list of list of nodes (in order) forming the paths, or None if no path

plot_dag(filename, traverser, node_size=500, label_font_size=12, text_angle=0, image_width=16, image_height=12)

plot the DAG to image file

Parameters
  • filename (str) – path to write image to

  • title (str) – title to add to chart

  • node_size (int) – node size

  • label_font_size (int) – font size

  • text_angle (int) – angle to rotate. This is angle in degrees counter clockwise from east

  • image_width (int) – width of image in inches

  • image_height (int) – heightof image in inches

Returns

nothing. Saves image to file

starting_nodes()

Where does the DAG start? Compute list of starting (level 0) nodes

Returns

list of node name

upstream_keys(instance_name)

get list of keys (names of nodes in the DAG) that feed into instance_name node

Parameters

instance_name (str) – name of instance

Returns

list of nodes

upstream_nodes_of_type(target_node_name, operation_type)
get set of nodes of a given operation type (OperationType.reader, OperationType.writer etc)

upstream of some given target node

Parameters

operation_type (OperationType) – OperationType

Returns

set of keys, if any, of the given operation type

upstream_typed_keys(instance_name)

get dictionary of the upstream keys with Operation types as values

Parameters

instance_name (str) – name of instance

Returns

node type}

Return type

dictionary of {name

primrose.configuration.util module

set of utility methods and enum for configurations

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

exception primrose.configuration.util.ConfigurationError

Bases: Exception

named error specifically for configuration errors

class primrose.configuration.util.ConfigurationSectionType

Bases: enum.Enum

set of top-level sections in config

IMPLEMENTATION_CONFIG = 'implementation_config'
METADATA = 'metadata'
values = <function ConfigurationSectionType.values>
class primrose.configuration.util.OperationType

Bases: enum.Enum

set of operation type identifiers

cleanup = 'cleanup_config'
dataviz = 'dataviz_config'
model = 'model_config'
names = <function OperationType.names>
pipeline = 'pipeline_config'
postprocess = 'postprocess_config'
reader = 'reader_config'
values = <function OperationType.values>
values_to_names = <function OperationType.values_to_names>
writer = 'writer_config'

Module contents