Welcome to mlPyp’s documentation!¶
A framework for machine learning’s pipelines.
- Highlights:
- Build datasets with train, test, valid data and transformations applied.
- Build datasets with metadata for reproducible experiments.
- Easy way to add distincts machine learning algorithms from Keras, scikit-learn, etc.
- Models with scores and predictors.
- Convert csv files to datasets.
- Uses transformations for manipulate data (images).

Instalation¶
git clone https://github.com/elaeon/ML.git
You can install the python dependences with pip, but we strongly recommend install the dependences with conda and conda forge.
conda config --add channels conda-forge
conda create -n new_environment --file ML/requirements.txt
source activate new_environment
pip install ML/
Quick start¶
First, build a dataset
from ml.ds import DataSetBuilder
import numpy as np
DIM = 21
SIZE = 100000
X = np.random.rand(SIZE, DIM)
Y = np.asarray([1 if sum(row) > 0 else 0
for row in np.sin(6*X) + 0.1*np.random.randn(SIZE, 1)])
dataset_name = "test_dataset"
dataset = DataSetBuilder(
dataset_name,
validator="cross")
dataset.build_dataset(X, Y)
Then, pass it to a classification model for training, in this case we used SVGC (was a Gaussian process with stochastic variational inference), once the training was finished you can predict some data.
from ml.clf.extended.w_gpy import SVGPC
classif = SVGPC(
dataset=dataset,
model_name="my_test_model",
model_version="1")
classif.train(batch_size=128, num_steps=10)
classif.scores().print_scores(order_column="f1")
Using SVGPC for make predictions is like this:
classif = SVGPC(
model_name="my_test_model",
model_version="1")
predictions = np.asarray(list(classif.predict(X, chunk_size=258)))
You can use more extra models (see Extra models). Extend the base model and make you own predictors! For more information about this, see the section Models.
CLI¶
mlPyp has a CLI where you can admin your datasets and models. For example
ml datasets
Return a table of datasets previosly builded.
Total size: 6.75 MB
dataset size date
----------------- ------- --------------------
numbers_tickets 2.27 MB 2017-01-26T22:25 UTC
numbers_tickets_d 4.48 MB 2017-01-25T17:01 UTC
Or
ml models
Returns
classif model name version dataset group
--------- ------------ --------- --------- -------
Boosting numerai 1 numerai
SVGPC test2 1 test2 basic
You can use “–help” for view more options.
Index¶
Support¶
If you encounter bugs then let me know .