skhubness.neighbors.NNG¶
-
class
skhubness.neighbors.
NNG
(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', optimize: bool = False, edge_size_for_creation: int = 80, edge_size_for_search: int = 40, num_incoming: int = -1, num_outgoing: int = -1, epsilon: float = 0.1, n_jobs: int = 1, verbose: int = 0)[source]¶ Wrapper for ngtpy and NNG variants.
By default, the graph is an ANNG. Only when the optimize parameter is set, the graph is optimized to obtain an ONNG.
- Parameters
- n_candidates: int, default = 5
Number of neighbors to retrieve
- metric: str, default = ‘euclidean’
Distance metric, allowed are ‘manhattan’, ‘L1’, ‘euclidean’, ‘L2’, ‘minkowski’, ‘Angle’, ‘Normalized Angle’, ‘Hamming’, ‘Jaccard’, ‘Cosine’ or ‘Normalized Cosine’.
- index_dir: str, default = ‘auto’
Store the index in the given directory. If None, keep the index in main memory (NON pickleable index), If index_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux. Note: The directory/the index will NOT be deleted automatically.
- optimize: bool, default = False
Use ONNG method by optimizing the ANNG graph. May require long time for index creation.
- edge_size_for_creation: int, default = 80
Increasing ANNG edge size improves retrieval accuracy at the cost of more time
- edge_size_for_search: int, default = 40
Increasing ANNG edge size improves retrieval accuracy at the cost of more time
- epsilon: float, default 0.1
Trade-off in ANNG between higher accuracy (larger epsilon) and shorter query time (smaller epsilon)
- num_incoming: int
Number of incoming edges in ONNG graph
- num_outgoing: int
Number of outgoing edges in ONNG graph
- n_jobs: int, default = 1
Number of parallel jobs
- verbose: int, default = 0
Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
Notes
NNG stores the index to a directory specified in index_dir. The index is persistent, and will NOT be deleted automatically. It is the user’s responsibility to take care of deletion, when required.
- Attributes
- valid_metrics:
List of valid distance metrics/measures
-
__init__
(self, n_candidates: 'int' = 5, metric: 'str' = 'euclidean', index_dir: 'str' = 'auto', optimize: 'bool' = False, edge_size_for_creation: 'int' = 80, edge_size_for_search: 'int' = 40, num_incoming: 'int' = -1, num_outgoing: 'int' = -1, epsilon: 'float' = 0.1, n_jobs: 'int' = 1, verbose: 'int' = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(self, n_candidates, metric, …)Initialize self.
fit
(self, X[, y])Build the ngtpy.Index and insert data from X.
get_params
(self[, deep])Get parameters for this estimator.
kneighbors
(self[, X, n_candidates, …])Retrieve k nearest neighbors.
set_params
(self, \*\*params)Set the parameters of this estimator.
Attributes
internal_distance_type
valid_metrics
-
fit
(self, X, y=None) → 'NNG'[source]¶ Build the ngtpy.Index and insert data from X.
- Parameters
- X: np.array
Data to be indexed
- y: any
Ignored
- Returns
- self: NNG
An instance of NNG with a built index
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
- Parameters
- deepboolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
kneighbors
(self, X=None, n_candidates=None, return_distance=True) → 'Union[Tuple[np.array, np.array], np.array]'[source]¶ Retrieve k nearest neighbors.
- Parameters
- X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
- n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
- return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Returns
- self