qlearnkit.datasets package

Submodules

qlearnkit.datasets.breast_cancer module

qlearnkit.datasets.breast_cancer.load_breast_cancer(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads breast cancer dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Breast Cancer dataset as available in sklearn

qlearnkit.datasets.dataset_helper module

qlearnkit.datasets.dataset_helper.features_labels_from_data(X: Union[numpy.ndarray, list], y: Union[numpy.ndarray, list], train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script splits a dataset according to the required train size, test size and number of features

Parameters
  • X – raw data from dataset

  • y – labels from dataset

  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a sklearn.Bunch

    (similar to a dictionary) or not

  • Returns – Preprocessed dataset as available in sklearn

qlearnkit.datasets.dataset_helper.label_to_class_name(predicted_labels, classes) List[str][source]

Helper converts labels (numeric) to class name (string)

Parameters
  • predicted_labels (numpy.ndarray) – Nx1 array

  • classes (dict or list) – a mapping form label (numeric) to class name (str)

  • Example

Returns

list of predicted class names of each datum

Example

classes = [‘sepal length (cm)’,

‘sepal width (cm)’,

‘petal length (cm)’, ‘petal width (cm)’ ]

predicted_labels = [0, 2, 1, 2, 0]

print(label_to_class_name(predicted_labels, classes))

qlearnkit.datasets.dataset_helper.pca_reduce(X_train: numpy.ndarray, X_test: numpy.ndarray, n_components: int = 2) Tuple[numpy.ndarray, numpy.ndarray][source]

qlearnkit.datasets.iris module

qlearnkit.datasets.iris.load_iris(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads iris dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Iris dataset as available in sklearn

qlearnkit.datasets.wine module

qlearnkit.datasets.wine.load_wine(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads wine dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Wine dataset as available in sklearn

Module contents

qlearnkit.datasets.load_breast_cancer(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads breast cancer dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Breast Cancer dataset as available in sklearn

qlearnkit.datasets.load_iris(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads iris dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Iris dataset as available in sklearn

qlearnkit.datasets.load_wine(train_size: Optional[Union[float, int]] = None, test_size: Optional[Union[float, int]] = None, n_features: Optional[int] = None, *, use_pca: Optional[bool] = False, return_bunch: Optional[bool] = False)[source]

This script loads wine dataset from sklearn and splits it according to the required train size, test size and number of features

Parameters
  • test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • n_features – number of desired features

  • use_pca – whether to use PCA for dimensionality reduction or not default False

  • return_bunch

    whether to return a Bunch

    (similar to a dictionary) or not

  • Returns – Wine dataset as available in sklearn