primrose.readers package

Submodules

primrose.readers.csv_reader module

Module with AbstractNode implementation, able to read from CSV

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.csv_reader.CsvReader(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Reads CSV file into a pandas dataframe

get_optional_config()

Optionally get kwargs to pass to pandas csv reader.

Notes

kwargs (dict): dictionary of kwargs key-value pairs

Example

“csv_reader”: { “class”: “CsvReader”, “filename”: “data/mydata.csv”, “kwargs”: { “header”: None, “sep”: “:” } }

static necessary_config(node_config)

Returns the necessary configuration keys for the CsvReader object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

filename: name of the file

Returns

set of necessary keys for the CsvReader object

run(data_object)

Read CSV to a pandas dataframe

Returns

DataObject instance terminate (bool): should we terminate the DAG? true or false

Return type

data_object (DataObject)

primrose.readers.database_helper module

Helper function for get os.environ values

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

primrose.readers.database_helper.get_env_val(key)

get environmental variable

Returns

env variable (str)

primrose.readers.deserializer module

Module with AbstractNode implementation, able to read from local and gcs dill and pickle files

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com) Brian Graham (brian.graham@weightwatchers.com)

class primrose.readers.deserializer.Deserializer(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Read a local file and de-serialize it into memory.

DATA_KEY = 'reader_data'
SUPPORTED_DESERIALIZERS = {'dill': <module 'dill' from '/anaconda3/envs/ds-user-models/lib/python3.6/site-packages/dill/__init__.py'>, 'pickle': <module 'pickle' from '/anaconda3/envs/ds-user-models/lib/python3.6/pickle.py'>}
static necessary_config(node_config)

Returns the necessary configuration keys for the Deserializer object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

filename (str): local filename to be de-serialized deserializer (str): ‘dill’ or ‘pickle’

Returns

set of necessary keys for the Deserializer object

run(data_object)

Read dill object(s) from local filesystem

Parameters

data_object – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

class primrose.readers.deserializer.GcsDeserializer(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Read a file from GCS and de-serialize it into memory.

DATA_KEY = 'reader_data'
SUPPORTED_DESERIALIZERS = {'dill': <module 'dill' from '/anaconda3/envs/ds-user-models/lib/python3.6/site-packages/dill/__init__.py'>, 'pickle': <module 'pickle' from '/anaconda3/envs/ds-user-models/lib/python3.6/pickle.py'>}
download_blobs_as_strings()

Downloads a blob from the bucket contining the user specified blob_name

Returns

list of strings

static necessary_config(node_config)

Returns the necessary configuration keys for the GcsDeserializer object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

bucket_name: name of the GCS bucket blob_name: name of the blob deserializer (str): ‘dill’ or ‘pickle’

Returns

set of necessary keys for the GcsDeserializer object

run(data_object)

Read serialized object(s) from GCS bucket which contain the blob_name

Parameters

data_object – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

primrose.readers.dill_reader module

Module with AbstractNode implementation, able to read from local dill files

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class primrose.readers.dill_reader.DillReader(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Read a file from Gcs and un-dills it into memory

DATA_KEY = 'reader_data'
static necessary_config(node_config)

Returns the necessary configuration keys for the DillReader object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

filename: local filename to be de-serialized

Returns

set of necessary keys for the DillReader object

run(data_object)

Read dill object(s) from local filesystem

Parameters

data_object – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

primrose.readers.gcs_dill_reader module

Module with AbstractNode implementation, able to read from GCS

Author(s):

Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class primrose.readers.gcs_dill_reader.GcsDillReader(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Read a file from Gcs and un-dills it into memory

DATA_KEY = 'reader_data'
download_blobs_as_strings()

Downloads a blob from the bucket contining the user specified blob_name

Returns

list of strings

static necessary_config(node_config)

Returns the necessary configuration keys for the GcsDillReader object

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

bucket_name: name of the GCS bucket blob_name: name of the blob

Returns

set of necessary keys for the CsvReader object

run(data_object)

Read dill object(s) from GCS bucket which contain the blob_name

Parameters

data_object – DataObject instance

Returns

tuple containing:

data_object (DataObject): instance of DataObject

terminate (bool): terminate the DAG?

Return type

(tuple)

primrose.readers.mysql_helper module

MySQL helper

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.mysql_helper.MySQLHelper

Bases: object

“some utility methods for connecting to MySQL

static create_db_connection()

authenticate with MySQL database

Returns

MySQL db object

Return type

db (connection object)

static extract_mysql_credentials()

extract MySQL credentials from config

Returns

tuple containing: host (str): host port (int): port database (str): database name username (str): username password (str): password

Return type

(tuple)

primrose.readers.mysql_reader module

Module with AbstractReader implementation, able to read from MySQL

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.mysql_reader.MySQLReader(configuration, instance_name)

Bases: primrose.base.sql_reader.AbstractSqlReader

Runs MySQL queries into pandas dataframes

get_connection()

return connection to MySQL DB

Returns

connection to MySQL DB

primrose.readers.oracle_reader module

Module with AbstractReader implementation, able to read from Oracle

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.oracle_reader.OracleReader(configuration, instance_name)

Bases: primrose.base.sql_reader.AbstractSqlReader

Runs Oracle queries into pandas dataframes

get_connection()

return connection to Oracle DB

Returns

connection to Oracle DB

primrose.readers.postgres_helper module

Postgres helper

Author(s):

Wassym Kalouache (wassym.kalouache@weightwatchers.com)

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.postgres_helper.PostgresHelper

Bases: object

some utility methods for connecting to postgres

static create_db_connection()

authenticate with postgres database

Returns

postgres db object

Return type

db (connection)

static extract_postgres_credentials()

extract PostgreSQL credentials from config

Returns

tuple containing: host (str): host port (int): port database (str): database name username (str): username password (str): password

Return type

(tuple)

primrose.readers.postgres_reader module

Module with AbstractReader implementation, able to read from PostgreSQL

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.postgres_reader.PostgresReader(configuration, instance_name)

Bases: primrose.base.sql_reader.AbstractSqlReader

Runs PostgreSQL queries into pandas dataframes

get_connection()

return connection to PostgreSQL DB

Returns

connection to PostgreSQL DB

primrose.readers.redshift_reader module

Module with AbstractReader implementation, able to read from AWS Redshift

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.redshift_reader.OracleReader(configuration, instance_name)

Bases: primrose.base.sql_reader.AbstractSqlReader

Runs Redshift queries into pandas dataframes

get_connection()

return connection to Redshift DB

Returns

connection to Redshift DB

primrose.readers.sklearn_dataset_reader module

Module to read canned datasets from sklearn

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.sklearn_dataset_reader.SklearnDatasetReader(configuration, instance_name)

Bases: primrose.base.reader.AbstractReader

Read data from sklearn.dataset into a pandas dataframe

static necessary_config(node_config)

Returns the necessary configuration keys for the SklearnDatasetReader object

Note

dataset (str): name of supported sklearn.dataset. One of “iris”, “boston”, “diabetes”, “breast_cancer”, “linnerud”, “wine”

Returns

set of necessary keys for the SklearnDatasetReader object

run(data_object)

Read sklearn dataset to a pandas dataframe

Returns

DataObject instance terminate (bool): should we terminate the DAG? true or false

Return type

data_object (DataObject)

primrose.readers.sqlite_reader module

Module with AbstractReader implementation, able to read from SQLite

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.readers.sqlite_reader.SQLiteReader(configuration, instance_name)

Bases: primrose.base.sql_reader.AbstractSqlReader

Runs SQLite queries into pandas dataframes

get_connection()

return connection to SQLite DB

Returns

connection to SQLite DB file

static necessary_config(node_config)

Return a list of necessary configuration keys within the implementation

Parameters

node_config (dict) – set of parameters / attributes for the node

Note

After adding this list, validation automatically occurs before instantiation in the pipeline factory.

Returns

set of keys necessary to run implementation

Module contents