skhubness.neighbors.RandomProjectionTree

class skhubness.neighbors.RandomProjectionTree(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = -1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for using annoy.AnnoyIndex

Annoy is an approximate nearest neighbor library, that builds a forest of random projections trees.

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

metric: str, default = ‘euclidean’

Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”

n_trees: int, default = 10

Build a forest of n_trees trees. More trees gives higher precision when querying, but are more expensive in terms of build time and index size.

search_k: int, default = -1

Query will inspect search_k nodes. A larger value will give more accurate results, but will take longer time.

mmap_dir: str, default = ‘auto’

Memory-map the index to the given directory. This is required to make the the class pickleable. If None, keep everything in main memory (NON pickleable index), if mmap_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux.

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(self, n_candidates: 'int' = 5, metric: 'str' = 'euclidean', n_trees: 'int' = 10, search_k: 'int' = -1, mmap_dir: 'str' = 'auto', n_jobs: 'int' = 1, verbose: 'int' = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, n_candidates, metric, …)

Initialize self.

fit(self, X[, y])

Build the annoy.Index and insert data from X.

get_params(self[, deep])

Get parameters for this estimator.

kneighbors(self[, X, n_candidates, …])

Retrieve k nearest neighbors.

set_params(self, \*\*params)

Set the parameters of this estimator.

Attributes

valid_metrics

fit(self, X, y=None) → 'RandomProjectionTree'[source]

Build the annoy.Index and insert data from X.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: RandomProjectionTree

An instance of RandomProjectionTree with a built index

get_params(self, deep=True)

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

kneighbors(self, X=None, n_candidates=None, return_distance=True) → 'Union[Tuple[np.array, np.array], np.array]'[source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self