MVtSNE¶
Multiview t-distributed Stochastic Neighbour Embedding¶
MV-tSNE with log-linear opinion pooling, using GAs to optimize the weights of each view (opinion).
It computes the multiview spectral clustering of data on a list of matrices or distance matrices (or a mix of both), supposed to be different views of the same data. The function is split in two parts, one that computes P and other that performs tsne on a given P plus an “interface” function that does everything efficient way.
-
class
multiview.mvtsne.
MvtSNE
(k=2, initial_dims=30, perplexity=30, max_iter=1000, min_cost=0, epoch_callback=None, whiten=True, epoch=100, random_state=0)¶ Multiview tSNE using an expert opinion pooling on the input probability matrices.
Given a list of of input views and other parameters, mvtsne computes a neighbouring probability matrix for each input view, then finds the optimal set of weights to combine these matrices using a log-linear pool, and applies the pooled probability matrix as input to the standard tSNE procedure, where the probability matrix of the output space is adjusted to the pooled probability matrix using Kullback-Liebler divergence.
Notes
All input views must have the same number of samples (rows).
Parameters: - k (int, default: 2) – The desired dimension of the resulting embedding.
- initial_dims (int, default: 30) – Number of dimensions to use in the reduction method.
- perplexity (int, defuult: 30) – This perplexity parameter is roughly equivalent to the optimal number of neighbours.
- max_iter (int, default: 1000) – Maximum number of iterations to perform.
- min_cost (numeric, default: 0) – The minimum cost value (error) to stop iterations.
- epoch_callback (callable, default None) – A callback function to be called after each epoch (which is a number of iterations controlled parameter epoch, see next).
- whiten (int, default: 1) – A boolean value indicating if the data matrices should be whitened.
- epoch (int, default: 100) – The number of iterations between update messages.
References
Abbas, Ali E. 2009. “A Kullback-Leibler View of Linear and Log-Linear Pools.” Decision Analysis 6 (1): 25–37. doi:10.1287/deca.1080.0133.
Carvalho, Arthur, and Kate Larson. 2012. “A Consensual Linear Opinion Pool.” http://arxiv.org/abs/1204.5399.
Van Der Maaten, Laurens, Geoffrey Hinton, and Geoffrey Hinton van der Maaten. 2008. “Visualizing Data using t-SNE.” doi:10.1007/s10479-011-0841-3.
-
fit
(X, is_distance)¶ Computes standard tSNE algorithm to input multiview data. Return the weights used in the algorithm and the probabilitmatrix.
Notes
All input views must have the same number of samples (rows).
Parameters: - X (list) – A list of feature matrices or distance matrices, where each matrix is one of the views of the dataset.
- is_distance (array-like.) – A list or array which indicates whether a matrix with the same index in x is a distance matrix (true value) or not (false value).
-
fit_transform
(X, is_distance)¶ Computes standard tSNE algorithm to input multiview data. Return the weights used in the algorithm and the probabilitmatrix.
Notes
All input views must have the same number of samples (rows).
Parameters: - X (list) – A list of feature matrices or distance matrices, where each matrix is one of the views of the dataset.
- is_distance (array-like.) – A list or array which indicates whether a matrix with the same index in x is a distance matrix (true value) or not (false value).
-
embedding_
¶ ndarray. – Embedded space.
-
weights_
¶ ndarray. – Ideal weights used in the embedding.
Returns: output – A tuple with two elements:
embedding with the k-dimensional embedding of the input samples
weights with the weights associated to each input data view.
Return type: tuple.
Raises: - ValueError: Matrices are not square matrices, k value is negative,
- data samples and is_distance parameters do not have the same length or
- scalar parameters are negative.
Examples
>>> import numpy as np >>> m = np.array([[1, 4, 7], [2, 5, 8], [3, 6, 9]]) >>> q = np.array([[9, 6, 3], [8, 5, 2], [7, 4, 1]]) >>> r = np.array([[2, 1, 8], [4, 5, 6], [3, 7, 9]]).T >>> matrices = [m, q, r] >>> is_distance = [False, False, False] >>> mvtsne = MvtSNE() >>> mvtsne.fit_transform(matrices, is_distance) (matrix([[-1347.89641563, -415.25549328], [ 1305.18939063, 398.91164491], [ 42.70702501, 16.34384836]]), array([ 0.878037 , 0.64703391, 0.56962457]))