Module index

Module for create datasets from distinct sources of data.

class ds.DataSetBuilder(name=None, dataset_path=None, apply_transforms=True, transforms=None, train_size=0.7, valid_size=0.1, validator='cross', dtype='float64', ltype='|S1', description='', author='', compression_level=0, chunks=100, rewrite=False)[source]

Base class for dataset build. Get data from memory. create the initial values for the dataset.

Parameters:
  • name (string) – dataset’s name
  • dataset_path (string) – path where the datased is saved. This param is automaticly set by the settings.cfg file.
  • apply_transforms (bool) – apply transformations to the data
  • processing_class (class) – class where are defined the functions for preprocessing data.
  • train_size (float) – value between [0, 1] who determine the size of the train data
  • valid_size (float) – value between [0, 1] who determine the size of the validation data
  • validator (string) – name of the method for extract from the data, the train data, test data and valid data
  • dtype (string) – the type of the data to save
  • description (string) – an bref description of the dataset
  • author (string) – Dataset Author’s name
  • compression_level (int) – number in 0-9 range. If 0 is passed no compression is executed
  • rewrite (bool) – if true, you can clean the saved data and add a new dataset.
  • chunks (int) – number of chunks to use when the dataset is copy or desfragmented.
build_dataset(data, labels, test_data=None, test_labels=None, validation_data=None, validation_labels=None)[source]
Parameters:
  • data (ndarray) – array of values to save in the dataset
  • labels (ndarray) – array of labels to save in the dataset
convert(name, dtype='float64', ltype='|S1', apply_transforms=False, percentaje=1)[source]
Parameters:
  • name (string) – converted dataset’s name
  • dtype (string) – cast the data to the defined type
  • ltype (string) – cast the labels to the defined type
  • apply_transforms (bool) – apply the transforms to the data
  • percentaje (float) – values between 0 and 1, this value specify the percentaje of the data to apply transforms and cast function, then return a subset
desfragment()[source]

Concatenate the train, valid and test data in a data array. Concatenate the train, valid, and test labels in another array. return data, labels

info(classes=False)[source]
Parameters:classes (bool) – if true, print the detail of the labels

This function print the details of the dataset.

score_train_test()[source]

return the score of separability between the train data and the test data.

shape

return the shape of the dataset

class ds.DataSetBuilderImage(name=None, image_size=None, train_folder_path=None, **kwargs)[source]

Class for images dataset build. Get the data from a directory where each directory’s name is the label.

Parameters:image_size (int) – define the image size to save in the dataset

kwargs are the same that DataSetBuilder’s options

Parameters:data_folder_path (string) – path to the data what you want to add to the dataset, split the data in train, test and validation. If you want manualy split the data in train and test, check test_folder_path.
build_dataset()[source]

the data is extracted from the train_folder_path, and then saved.

images_to_dataset(folder_base)[source]
Parameters:folder_base (string path) – path where live the images to convert

extract the images from folder_base, where folder_base has the structure folder_base/label/

class ds.DataSetBuilderFile(name=None, train_folder_path=None, **kwargs)[source]

Class for csv dataset build. Get the data from a csv’s file.

build_dataset(label_column=None)[source]
Parameters:label_column (string) – column’s name where are the labels
from_csv(folder_path, label_column)[source]
Parameters:
  • folder_path (string) – path to the csv.
  • label_column (string) – column’s name where are the labels
class ds.DataSetBuilderFold(n_splits=2)[source]

Class for create datasets folds from datasets.

Parameters:n_plists – numbers of splits for apply to the dataset
build_dataset(dataset=None)[source]
Parameters:dataset (DataLabel) – dataset to fold

construct the dataset fold from an DataSet class

create_folds(dl)[source]
Parameters:dl (DataLabel) – datalabel to split

return an iterator of splited datalabel in n_splits DataSetBuilder datasets

get_splits()[source]

return an iterator of datasets with the splits of original data

class processing.Transforms[source]

In this class are deposit the functions for apply to the data.

transforms = Transforms()

transforms.add(function1, {‘a’: 1, ‘b’: 0}) -> function1(a=1, b=0)

transforms.add(function2, {‘x’: 10}) -> function2(x=10)

add(fn, **params)[source]
Parameters:
  • fn (function) – function to add
  • params (dict) – the parameters of the function fn

This function add to the class the functions to use with the data.

apply(data)[source]
Parameters:data (array) – apply the transforms added to the data
empty()[source]

return True if not transforms was added.

classmethod from_json(json_transforms)[source]

from json format to Transform class.

to_json()[source]

convert this class to json format

class clf.measures.Measure(predictions, labels, labels2classes_fn)[source]

For measure the results of the predictors, distincts measures are defined in this class

Parameters:
  • predictions (array) – array of predictions
  • labels (array) – array of correct labels of type float for compare with the predictions
  • labels2classes_fn (function) – function for transform the labels to classes
accuracy()[source]

measure for correct predictions, true positives and true negatives.

auc()[source]

area under the curve of the reciver operating characteristic, measure for true positives rate and false positive rate

f1()[source]

weighted average presicion and recall

logloss()[source]

accuracy by penalising false classifications

precision()[source]

measure for false positives predictions.

recall()[source]

measure from false negatives predictions.

class clf.measures.ListMeasure(headers=None, measures=None, order=None)[source]

Class for save distincts measures

Parameters:
  • headers (list) – lists of headers
  • measures (list) – list of values

list_measure = ListMeasure(headers=[“classif”, “f1”], measures=[[“test”, 0.5], [“test2”, 0.6]])

add_measure(name, value, i=0, reverse=False)[source]
Parameters:
  • name (string) – column name
  • value (float) – value to add
drop_empty_columns()[source]

drop empty columns

empty_columns()[source]

return a set of indexes of empty columns

get_measure(name)[source]
Parameters:name (string) – by name of the column you can get his values.
measures_to_dict()[source]

convert the matrix to a dictionary

print_scores(order_column=None)[source]
Parameters:
  • order_column (string) – order the matrix by the order_column name that you pass
  • reverse (bool) – if False the order is ASC else DESC

print the matrix

class detector.HOG(model_name=None, check_point_path=None, model_version=None, transforms=None)[source]

Create a histrogram oriented gradient. You need the dlib’s library and his python bindings to use this class.

Parameters:
  • model_name (string) – Name of the model
  • check_point_path (string) – path where the model will be saved, this param is taken from settings
  • model_version (string) – a string number for identify the differents models
  • transforms (Transforms) – the transforms to apply to the data
detector()[source]

return dlib.simple_object_detector

draw_detections(pictures)[source]
Parameters:pictures (list) – list of paths of pictures to search the boinding boxes.

draw the bounding boxes from the training model.

load()[source]

Loadd the metadata saved after the training.

scores(measures=None)[source]
Parameters:measures (list) – list of measures names to show in the score’s table.
test()[source]

test the training model.

train(xml_filename)[source]
Parameters:xml_filename (string) – name of the filename where are defined the bounding boxes