Skip to main content

Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms

  • Conference paper
Principal Manifolds for Data Visualization and Dimension Reduction

Part of the book series: Lecture Notes in Computational Science and Enginee ((LNCSE,volume 58))

Spectral embedding and spectral clustering are common methods for non-linear dimensionality reduction and clustering of complex high dimensional datasets. In this paper we provide a diffusion based probabilistic analysis of algorithms that use the normalized graph Laplacian. Given the pairwise adjacency matrix of all points in a dataset, we define a random walk on the graph of points and a diffusion distance between any two points. We show that the diffusion distance is equal to the Euclidean distance in the embedded space with all eigenvectors of the normalized graph Laplacian. This identity shows that characteristic relaxation times and processes of the random walk on the graph are the key concept that governs the properties of these spectral clustering and spectral embedding algorithms. Specifically, for spectral clustering to succeed, a necessary condition is that the mean exit times from each cluster need to be significantly larger than the largest (slowest) of all relaxation times inside all of the individual clusters. For complex, multiscale data, this condition may not hold and multiscale methods need to be developed to handle such situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schölkopf, B. and Smola, A. J., and Müller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10 (5), 1299-1319 (1998)

    Article  Google Scholar 

  2. Weiss, Y.: Segmentation using eigenvectors: a unifying view. ICCV (1999)

    Google Scholar 

  3. Shi, J. and Malik, J.: Normalized cuts and image segmentation. PAMI, 22 (8), 888-905, (2000)

    Google Scholar 

  4. Ding, C., He, X., Zha, H., Gu, M., and Simon, H.: A min-max cut algorithm for graph partitioning and data clustering. In: Proc. IEEE International Conf. Data Mining, 107-114, (2001)

    Google Scholar 

  5. Cristianini, N., Shawe-Taylor, J., and Kandola, J.: Spectral kernel methods for clustering. NIPS, 14 (2002)

    Google Scholar 

  6. Belkin, M. and Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS, 14 (2002)

    Google Scholar 

  7. Belkin, M. and Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373-1396 (2003)

    Article  MATH  Google Scholar 

  8. Ng, A. Y., Jordan, M., and Weiss, Y.: On spectral clustering, analysis and an algorithm. NIPS, 14 (2002)

    Google Scholar 

  9. Zhu, X., Ghahramani, Z., and Lafferty J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th interna-tional conference on machine learning (2003)

    Google Scholar 

  10. Saerens, M., Fouss, F., Yen L., and Dupont, P.: The principal component analy-sis of a graph and its relationships to spectral clustering. In: Proceedings of the 15th European Conference on Machine Learning, ECML, 371-383 (2004)

    Google Scholar 

  11. Coifman, R. R., Lafon, S.: Diffusion Maps. Appl. Comp. Harm. Anal., 21, 5-30 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Coifman, R. R., Lafon, S., Lee, A. B., Maggioni, M., Nadler, B., Warner, F., and Zucker S.: Geometric diffusion as a tool for harmonic analysis and structure definition of data, parts I and II. Proc. Nat. Acad. Sci., 102 (21), 7426-7437 (2005)

    Article  Google Scholar 

  13. Berard, P., Besson, G., and Gallot, S.: Embedding Riemannian manifolds by their heat kernel. Geometric and Functional Analysis, 4 (1994)

    Google Scholar 

  14. Meila, M., Shi, J.: A random walks view of spectral segmentation. AI and Statistics (2001)

    Google Scholar 

  15. Yen, L., Vanvyve, D., Wouters, F., Fouss, F., Verleysen M., and Saerens, M.: Clustering using a random-walk based distance measure. In: Proceedings of the 13th Symposium on Artificial Neural Networks, ESANN, 317-324 (2005)

    Google Scholar 

  16. Tishby, N. and Slonim, N.: Data Clustering by Markovian Relaxation and the information bottleneck method. NIPS (2000)

    Google Scholar 

  17. Chennubhotla, C. and Jepson, A. J.: Half-lives of eigenflows for spectral clustering. NIPS (2002)

    Google Scholar 

  18. Harel, D. and Koren, Y.: Clustering spatial data using random walks. In: Pro-ceedings of the 7th ACM Int. Conference on Knowledge Discovery and Data Mining, 281-286. ACM Press (2001)

    Google Scholar 

  19. Pons, P. and Latapy, M.: Computing Communities in Large Networks Using Random Walks. In: 20th International Symposium on Computer and Informa-tion Sciences (ISCIS’05). LNCS 3733 (2005)

    Google Scholar 

  20. Nadler, B., Lafon, S., Coifman, R. R., and Kevrekidis, I. G.: Diffusion maps spec-tral clustering and eigenfunctions of Fokker-Planck operators. NIPS (2005)

    Google Scholar 

  21. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065-1076 (1962)

    Article  MATH  MathSciNet  Google Scholar 

  22. Lafon, S. and Lee, A. B.: Diffusion maps: A unified framework for dimension reduction, data partitioning and graph subsampling. IEEE Trans. Patt. Anal. Mach. Int., 28 (9), 1393-1403 (2006)

    Article  Google Scholar 

  23. Yu, S. and Shi, J.: Multiclass spectral clustering. ICCV (2003)

    Google Scholar 

  24. Nadler, B., Lafon, S., Coifman, R. R., and Kevrekidis, I. G.: Diffusion maps, spectral clustering, and the reaction coordinates of dynamical systems. Appl. Comp. Harm. Anal., 21, 113-127 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  25. von Luxburg, U., Bousquet, O., and Belkin, M.: On the convergence of spectral clustering on random samples: the normalized case. NIPS (2004)

    Google Scholar 

  26. Belkin, M. and Niyogi, P.: Towards a theoeretical foundation for Laplacian-based manifold methods. COLT (2005)

    Google Scholar 

  27. Hein, M., Audibert, J., and von Luxburg, U.: From graphs to manifolds -weak and strong pointwise consistency of graph Laplacians. COLT (2005)

    Google Scholar 

  28. Singer, A.: From graph to manifold Laplacian: the convergence rate. Applied and Computational Harmonic Analysis, 21 (1), 135-144 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  29. Belkin, M. and Niyogi, P.: Convergence of Laplacian eigenmaps. NIPS (2006)

    Google Scholar 

  30. Gardiner, C. W.: Handbook of Stochastic Methods, 3rd edition. Springer, NY (2004)

    Google Scholar 

  31. Risken, H.: The Fokker Planck equation, 2nd edition. Springer NY (1999)

    Google Scholar 

  32. Matkowsky, B. J. and Schuss, Z.: Eigenvalues of the Fokker-Planck operator and the approach to equilibrium for diffusions in potential fields. SIAM J. App. Math. 40 (2), 242-254 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  33. Basri, R., Roth, D., and Jacobs, D.: Clustering appearances of 3D objects. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR-98), 414-420 (1998)

    Google Scholar 

  34. Roweis, S. T. and Saul, L. K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323-2326 (2000)

    Article  Google Scholar 

  35. Kato, T.: Perturbation Theory for Linear Operators, 2nd edition. Springer (1980)

    MATH  Google Scholar 

  36. Nadler, B. and Galun, M.: Fundamental limitations of spectral clustering. NIPS, 19(2006)

    Google Scholar 

  37. Nadler, B.: Finite Sample Convergence Results for Principal Component Analy-sis: A Matrix Perturbation Approach, submitted.

    Google Scholar 

  38. Zhou, D., Bousquet, O., Navin Lal, T., Weston J., and Scholkopf, B.: Learning with local and global consistency. NIPS, 16 (2004)

    Google Scholar 

  39. Kevrekidis, I. G., Gear, C. W., Hummer, G.: Equation-free: The computer-aided analysis of complex multiscale systems. AIChE J. 501346-1355 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nadler, B., Lafon, S., Coifman, R., Kevrekidis, I.G. (2008). Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds) Principal Manifolds for Data Visualization and Dimension Reduction. Lecture Notes in Computational Science and Enginee, vol 58. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73750-6_10

Download citation

Publish with us

Policies and ethics