primrose.readers package¶
Submodules¶
primrose.readers.csv_reader module¶
Module with AbstractNode implementation, able to read from CSV
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.csv_reader.
CsvReader
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Reads CSV file into a pandas dataframe
-
get_optional_config
()¶ Optionally get kwargs to pass to pandas csv reader.
Notes
kwargs (dict): dictionary of kwargs key-value pairs
Example
“csv_reader”: { “class”: “CsvReader”, “filename”: “data/mydata.csv”, “kwargs”: { “header”: None, “sep”: “:” } }
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the CsvReader object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
filename: name of the file
- Returns
set of necessary keys for the CsvReader object
-
run
(data_object)¶ Read CSV to a pandas dataframe
- Returns
DataObject instance terminate (bool): should we terminate the DAG? true or false
- Return type
data_object (DataObject)
-
primrose.readers.database_helper module¶
Helper function for get os.environ values
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
primrose.readers.database_helper.
get_env_val
(key)¶ get environmental variable
- Returns
env variable (str)
primrose.readers.deserializer module¶
Module with AbstractNode implementation, able to read from local and gcs dill and pickle files
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com) Brian Graham (brian.graham@weightwatchers.com)
-
class
primrose.readers.deserializer.
Deserializer
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Read a local file and de-serialize it into memory.
-
DATA_KEY
= 'reader_data'¶
-
SUPPORTED_DESERIALIZERS
= {'dill': <module 'dill' from '/anaconda3/envs/ds-user-models/lib/python3.6/site-packages/dill/__init__.py'>, 'pickle': <module 'pickle' from '/anaconda3/envs/ds-user-models/lib/python3.6/pickle.py'>}¶
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the Deserializer object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
filename (str): local filename to be de-serialized deserializer (str): ‘dill’ or ‘pickle’
- Returns
set of necessary keys for the Deserializer object
-
run
(data_object)¶ Read dill object(s) from local filesystem
- Parameters
data_object – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
-
class
primrose.readers.deserializer.
GcsDeserializer
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Read a file from GCS and de-serialize it into memory.
-
DATA_KEY
= 'reader_data'¶
-
SUPPORTED_DESERIALIZERS
= {'dill': <module 'dill' from '/anaconda3/envs/ds-user-models/lib/python3.6/site-packages/dill/__init__.py'>, 'pickle': <module 'pickle' from '/anaconda3/envs/ds-user-models/lib/python3.6/pickle.py'>}¶
-
download_blobs_as_strings
()¶ Downloads a blob from the bucket contining the user specified blob_name
- Returns
list of strings
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the GcsDeserializer object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
bucket_name: name of the GCS bucket blob_name: name of the blob deserializer (str): ‘dill’ or ‘pickle’
- Returns
set of necessary keys for the GcsDeserializer object
-
run
(data_object)¶ Read serialized object(s) from GCS bucket which contain the blob_name
- Parameters
data_object – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
primrose.readers.dill_reader module¶
Module with AbstractNode implementation, able to read from local dill files
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
primrose.readers.dill_reader.
DillReader
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Read a file from Gcs and un-dills it into memory
-
DATA_KEY
= 'reader_data'¶
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the DillReader object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
filename: local filename to be de-serialized
- Returns
set of necessary keys for the DillReader object
-
run
(data_object)¶ Read dill object(s) from local filesystem
- Parameters
data_object – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
primrose.readers.gcs_dill_reader module¶
Module with AbstractNode implementation, able to read from GCS
- Author(s):
Mike Skarlinski (michael.skarlinski@weightwatchers.com)
-
class
primrose.readers.gcs_dill_reader.
GcsDillReader
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Read a file from Gcs and un-dills it into memory
-
DATA_KEY
= 'reader_data'¶
-
download_blobs_as_strings
()¶ Downloads a blob from the bucket contining the user specified blob_name
- Returns
list of strings
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the GcsDillReader object
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
bucket_name: name of the GCS bucket blob_name: name of the blob
- Returns
set of necessary keys for the CsvReader object
-
run
(data_object)¶ Read dill object(s) from GCS bucket which contain the blob_name
- Parameters
data_object – DataObject instance
- Returns
tuple containing:
data_object (DataObject): instance of DataObject
terminate (bool): terminate the DAG?
- Return type
(tuple)
-
primrose.readers.mysql_helper module¶
MySQL helper
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.mysql_helper.
MySQLHelper
¶ Bases:
object
“some utility methods for connecting to MySQL
-
static
create_db_connection
()¶ authenticate with MySQL database
- Returns
MySQL db object
- Return type
db (connection object)
-
static
extract_mysql_credentials
()¶ extract MySQL credentials from config
- Returns
tuple containing: host (str): host port (int): port database (str): database name username (str): username password (str): password
- Return type
(tuple)
-
static
primrose.readers.mysql_reader module¶
Module with AbstractReader implementation, able to read from MySQL
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.mysql_reader.
MySQLReader
(configuration, instance_name)¶ Bases:
primrose.base.sql_reader.AbstractSqlReader
Runs MySQL queries into pandas dataframes
-
get_connection
()¶ return connection to MySQL DB
- Returns
connection to MySQL DB
-
primrose.readers.oracle_reader module¶
Module with AbstractReader implementation, able to read from Oracle
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.oracle_reader.
OracleReader
(configuration, instance_name)¶ Bases:
primrose.base.sql_reader.AbstractSqlReader
Runs Oracle queries into pandas dataframes
-
get_connection
()¶ return connection to Oracle DB
- Returns
connection to Oracle DB
-
primrose.readers.postgres_helper module¶
Postgres helper
- Author(s):
Wassym Kalouache (wassym.kalouache@weightwatchers.com)
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.postgres_helper.
PostgresHelper
¶ Bases:
object
some utility methods for connecting to postgres
-
static
create_db_connection
()¶ authenticate with postgres database
- Returns
postgres db object
- Return type
db (connection)
-
static
extract_postgres_credentials
()¶ extract PostgreSQL credentials from config
- Returns
tuple containing: host (str): host port (int): port database (str): database name username (str): username password (str): password
- Return type
(tuple)
-
static
primrose.readers.postgres_reader module¶
Module with AbstractReader implementation, able to read from PostgreSQL
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.postgres_reader.
PostgresReader
(configuration, instance_name)¶ Bases:
primrose.base.sql_reader.AbstractSqlReader
Runs PostgreSQL queries into pandas dataframes
-
get_connection
()¶ return connection to PostgreSQL DB
- Returns
connection to PostgreSQL DB
-
primrose.readers.redshift_reader module¶
Module with AbstractReader implementation, able to read from AWS Redshift
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.redshift_reader.
OracleReader
(configuration, instance_name)¶ Bases:
primrose.base.sql_reader.AbstractSqlReader
Runs Redshift queries into pandas dataframes
-
get_connection
()¶ return connection to Redshift DB
- Returns
connection to Redshift DB
-
primrose.readers.sklearn_dataset_reader module¶
Module to read canned datasets from sklearn
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.sklearn_dataset_reader.
SklearnDatasetReader
(configuration, instance_name)¶ Bases:
primrose.base.reader.AbstractReader
Read data from sklearn.dataset into a pandas dataframe
-
static
necessary_config
(node_config)¶ Returns the necessary configuration keys for the SklearnDatasetReader object
Note
dataset (str): name of supported sklearn.dataset. One of “iris”, “boston”, “diabetes”, “breast_cancer”, “linnerud”, “wine”
- Returns
set of necessary keys for the SklearnDatasetReader object
-
run
(data_object)¶ Read sklearn dataset to a pandas dataframe
- Returns
DataObject instance terminate (bool): should we terminate the DAG? true or false
- Return type
data_object (DataObject)
-
static
primrose.readers.sqlite_reader module¶
Module with AbstractReader implementation, able to read from SQLite
- Author(s):
Carl Anderson (carl.anderson@weightwatchers.com)
-
class
primrose.readers.sqlite_reader.
SQLiteReader
(configuration, instance_name)¶ Bases:
primrose.base.sql_reader.AbstractSqlReader
Runs SQLite queries into pandas dataframes
-
get_connection
()¶ return connection to SQLite DB
- Returns
connection to SQLite DB file
-
static
necessary_config
(node_config)¶ Return a list of necessary configuration keys within the implementation
- Parameters
node_config (dict) – set of parameters / attributes for the node
Note
After adding this list, validation automatically occurs before instantiation in the pipeline factory.
- Returns
set of keys necessary to run implementation
-