MV-SC

Multiview Spectral Clustering

Multiview Spectral Clustering.

It provides a function, mvsc, which produces a single clustering assigment, but considering all the input data from different views.

class multiview.mvsc.MVSC(k=2, sigmas=None, neighbours=None, clustering=True)

Multiview spectral clustering on a list of matrices or distance matrices.

Computes the multiview spectral clustering of data on a list of matrices or distance matrices (or a mix of both), supposed to be different views of the same data. In the case of plain data matrices, euclidean distance will be used to generate distance matrices for that data view.

Notes

All input views must have the same number of samples (rows).

Parameters:
  • k (int) – Number of desired clusters.
  • sigmas (Either None, an integer value or a vector of int, default: None) – They correspond to the sigma parameter in the Gaussian radial basis function. If it is None then the default sigma computation is used (average distance to the log(n)-th neighbour, with n = number of samples), unless neighbours has a value different from None. If it is a single number then the same sigma is applied to all input views. If it is a vector each value in it is applied to the corresponding input view.
  • neighbours (Either None, an integer value or a vector of int, default: None) – They correspond to the expected number of neighbours per point, used to estimate the sigma values of the Gaussian radial basis function. If it is NULL then the default sigma computation is used (average distance to the log(n)-th neighbour, with n = number of samples). If it is a single value then the same number of neighbours is used on all input views, else each value in the vector is applied to the corresponding input view. Does not have effect if sigma is different from None.
  • clustering (boolean) – Tells mvsc if it has to perform the clustering on the projection or to skip the clustering step of the algorithm.
embedding_

ndarray – Clustering of the nviews input data.

evalues_

ndarray – Eigenvalues computed during spectral clustering.

evectors_

ndarray – Eigenvectors computed during spectral clustering.

sigmas_

ndarray – Best sigmas used for calculating Gaussian similarity.

References

Ng, Andrew Y, Michael I Jordan, and Yair Weiss. 2001. “On spectral clustering: Analysis and an algorithm.” Nips 14 (14). MIT Press: 849–56. doi:10.1.1.19.8100.

Planck, Max, and Ulrike Von Luxburg. 2006. “A Tutorial on Spectral Clustering A Tutorial on Spectral Clustering.” Statistics and Computing 17 (March). Springer US: 395–416. doi:10.1007/s11222-007-9033-z.

Shi, Jianbo, and Jitendra Malik. 2005. “Normalized Cuts and Image Segmentation Normalized Cuts and Image Segmentation.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (March): 888–905. doi:10.1109/CVPR.1997.609407.

Trendafilov, Nickolay T. 2010. “Stepwise estimation of common principal components.” Computational Statistics and Data Analysis 54 (12): 3446–57. doi:10.1016/j.csda.2010.03.010.

fit(x, is_distance)

Computes the multiview spectral clustering and return the clustering, eigenvalues, eienvectors and sigmas used in the computation.

Notes

All input views must have the same number of samples (rows).

Parameters:
  • x (list) – A list of feature matrices or distance matrices (or a mix of both).
  • is_distance (array-like.) – A list or array which indicates whether a matrix with the same index in x is a distance matrix (true value) or not (false value).
  • k (int) – Number of desired clusters.
fit_transform(x, is_distance)

Computes the multiview spectral clustering and return the clustering, eigenvalues, eienvectors and sigmas used in the computation.

Notes

All input views must have the same number of samples (rows).

Parameters:
  • x (list) – A list of feature matrices or distance matrices (or a mix of both).
  • is_distance (array-like.) – A list or array which indicates whether a matrix with the same index in x is a distance matrix (true value) or not (false value).
Returns:

A tuple with four elements:

clustering is a vector of integers with the clustering assignment of each sample (not included if clustering = FALSE)

evalues is a matrix with the eigenvalues of the common principal components (CPC) step

evectors is a matrix with the eigenvectors of the CPC step

sigmas is a vector with the sigmas used on the Gaussian radial basis function of each input view.

Return type:

tuple

Raises:
  • ValueError: Matrices are not square matrices, k value is negative
  • or data samples and is_distance parameters do not have the same
  • length.

Examples

>>> import numpy as np
>>> m = np.array([[1, 4, 7], [2, 5, 8], [3, 6, 9]])
>>> q = np.array([[9, 6, 3], [8, 5, 2], [7, 4, 1]])
>>> r = np.array([[2, 1, 8], [4, 5, 6], [3, 7, 9]]).T
>>> matrices = [m, q, r]
>>> is_distance = [False, False, False]
>>> mvsc = MVSC(k=3)
>>> mvsc.fit_transform(matrices, is_distance)
    (array([1, 2, 0]), array([[ 0.99983709, 0.99983709,  0.99947615],
                              [ 0.49076485, 0.49076485,  0.44022256],
                              [ 0.10945481, 0.10945481,  0.15255827]]),
    array([[-0.56674541,  0.64092999, -0.51769527],
           [-0.61728928,  0.08583247,  0.78204011],
           [-0.54566802, -0.76278538, -0.34699406]]),
    [1.7320508075688774, 1.7320508075688774, 5.2779168675293677])