primrose.models package

Submodules

primrose.models.minimal_search_engine module

concrete class that performs TFIDF on lemmatized tokens with optional ngrams

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.models.minimal_search_engine.MinimalSearchEngine(configuration, instance_name)

Bases: primrose.base.search_engine.AbstractSearchEngine

simple TFIDF search engine

tokenize(s, stopwords=[], add_ngrams=True)
tokenize a string document, optimized for recipe names given default stopwords and other

string cleanup operations

Parameters
  • s (str) – some document string

  • stopwords (list) – list of stopwords

  • add_ngrams (bool) – whehter to include ngrams on tokens

Returns

tokens (list): list of cleaned, standardized tokens from document

primrose.models.sklearn_classifier_model module

Module to run a basic decision tree model

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com) Mike Skarlinski (michael.skarlinski@weightwatchers.com)

class primrose.models.sklearn_classifier_model.SklearnClassifierModel(configuration, instance_name)

Bases: primrose.models.sklearn_model.SklearnModel

eval_model(data_object)

Evaluate model perfomance on a labeled testing dataset

Returns

instance of DataObject

Return type

data_object (DataObject)

static necessary_config(node_config)

Return a list of necessary configuration keys

Parameters

node_config (dict) – set of parameters / attributes for the node

Notes

model_parameters (dict): parameters that mirror the sklearn kwargs for the user’s model mode: train, eval or predict (see AbstractModel) sklearn_classifier_name: sklearn submodule and model name (submodule.model_name) of the user’s model grid_search_scoring: scoring function name from sklearn CV docs cv_folds: number of CV folds

Returns

set of required keys

predict(data_object)

Make distance-based predictions using the prebuilt matrix

Parameters
  • data_object – DataObject instance

  • load_model – load model object from gcs or not

Returns

data_object with prediction data added

train_model(data_object)

train the model using CV, according to user specified options

Parameters

data_object (DataObject) – instance of DataObject

Returns

Nothing

primrose.models.sklearn_cluster_model module

Module to run a basic clustering model

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.models.sklearn_cluster_model.SklearnClusterModel(configuration, instance_name)

Bases: primrose.models.sklearn_model.SklearnModel

fit_training_data()

fit training data to model

get_scores()

get the scores for X_test

Returns

returns a dictionary of scors

static necessary_config(node_config)

Return a list of necessary configuration keys

Note

X (list): list of columns to use

While you might expect model here, we do not need it when in predict or eval mode as the model is cached, only in train

Returns

set of required keys

primrose.models.sklearn_model module

A primrose model based around a sklearn model

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.models.sklearn_model.SklearnModel(configuration, instance_name)

Bases: primrose.base.model.AbstractModel

eval_model(data_object, load_model=False)

evalute a model by getting the scores

Returns

instance of DataObject

Return type

data_object (DataObject)

static evaluate_no_ground_truth_classifier_metrics(X, labels)

Compute a set of metric for a classifier where there is no ground truth

Parameters
  • X (datafframe) – the data

  • labels – the predicted classes

Returns

value

Return type

dictionary of score name

static evaluate_regression_metrics(actual, predictions)

compute set of metrics for a regression

Parameters
  • actual – vector of actual values

  • predictions – vector of predictions

Returns

value

Return type

dictionary of score name

load_model(data_object)

finds an upstream sklearn mode

Parameters

data_object (DataObject) – instance of DataObject

Returns

Sklearn model

predict(data_object, load_model=False, use_serial=False)

Predict y_test from X_test

Parameters
  • data_object (DataObject) – instance of DataObject

  • load_model – load model object from gcs or not

Returns

instance of DataObject

Return type

data_object (DataObject)

train_model(data_object)

train the model

Parameters

data_object (DataObject) – instance of DataObject

Returns

instance of DataObject

Return type

data_object (DataObject)

primrose.models.sklearn_regression_model module

Module to run a basic regression model

Author(s):

Carl Anderson (carl.anderson@weightwatchers.com)

class primrose.models.sklearn_regression_model.SklearnRegressionModel(configuration, instance_name)

Bases: primrose.models.sklearn_model.SklearnModel

fit_training_data()

fit training data to model

get_scores()

get the scores for y_test

Returns

dictionary of scores

static necessary_config(node_config)

Return a list of necessary configuration keys

Note

While you might expect model here, we do not need it when in predict or eval mode as the model is cached, only in train

Returns

set of required keys

Module contents