Abstract
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering.
Similar content being viewed by others
References
Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2(6):1152–1174
Basu S, Banerjee A, Mooney R (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of SIAM international conference on data mining, pp 333–344
Bilmes J (1997) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, ICSI
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Boley D, Kawale J (2013) Constrained spectral clustering using l1 regularization. In: SDM’13, pp 103–111
Chinrungrueng C, Squin CH (1995) Optimal adaptive k-means algorithm with dynamic adjustment of learning rate. IEEE Trans Neural Netw 6(1):157–169
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’09, pp 48:1–48:9
Davidson I (2012) Two approaches to understanding when constraints help clustering. In: Yang Q, Agarwal D, Pei J (eds) KDD. ACM, New York, pp 1312–1320
Davidson I, Ravi SS (2005) Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of 5th SIAM data mining conference
Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Finkel JR, Grenager T, Manning CD (2007) The infinite tree. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 272–279
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Gershman SJ, Blei DM (2011) A tutorial on Bayesian nonparametric models. J Math Psychol 56(1):1–12
Goldwater S, Griffiths TL, Johnson M (2006) Contextual dependencies in unsupervised word segmentation. In: Proceedings of the 21st international conference on computational linguistics, pp 673–680
Grira N, Crucianu M, Boujemaa N (2008) Active semi-supervised fuzzy clustering. Pattern Recogn 41(5):1851–1861
House L (2006) Nonparametric Bayesian models in expression proteomic applications. Duke University, Durham
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Li C, Phung D, Rana S, Venkatesh S (2013) Exploiting side information in distance dependent Chinese restaurant processes for data clustering. In: ICME
Li C, Rana S, Phung D, Venkatesh S (2016) Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records. Knowl Based Syst 99:168–182
Li C, Rana S, Phung D, Venkatesh S (2015) Data clustering using side information dependent Chinese restaurant processes. Knowl Inf Syst 47(2):463–488
Li C, Rana S, Phung D, Venkatesh S (2015) Small-variance asymptotics for Bayesian nonparametric models with constraints. Adv Knowl Discov Data Min 9078:92–105
Li C, Rana S, Phung D, Venkatesh S (2014) Regularizing topic discovery in EMRS with side information by using hierarchical Bayesian models. In: ICPR
Mallapragada PK, Jin R, Jain AK (2008) Active query selection for semi-supervised clustering. In: ICPR, pp 1–4
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, Wiley, New York
Muller P, Quintana FA (2004) Nonparametric Bayesian data analysis. Stat Sci 19(1):95–110
Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. JCGS 9(2):249–265
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856
Orbanz P (2010) Bayesian nonparametric models. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Berlin
Orbanz P, Buhmann JM (2008) Nonparametric Bayesian image segmentation. Int J Comput Vis 77(1–3):25–45
Ross J, Dy J (2013) Nonparametric mixture of Gaussian processes with constraints. ICML 28:1346–1354
Shental N, Bar-hillel A, Hertz T, Weinshall D (2003) Computing Gaussian mixture models with EM using equivalence constraints. Adv Neural Inf Process Syst 16:465–472
Sudderth E, Torralba A, Freeman W, Willsky A (2008) Describing visual scenes using transformed objects and parts. Int J Comput Vis 77(1):291–330
Vlachos A, Ghahramani Z, Korhonen A (2008) Dirichlet process mixture models for verb clustering. In: ICML workshop on prior knowledge for text and language processing, pp 1–6
Vlachos A, Korhonen A, Ghahramani Z (2009) Unsupervised and constrained Dirichlet process mixture models for verb clustering. GEMS ’09. Association for Computational Linguistics, Columbus, pp 74–82
Vlachos A, Ghahramani Z, Briscoe T (2010) Active learning for constrained Dirichlet process mixture models. In: Proceedings of the 2010 workshop on geometrical models of natural language semantics, pp 57–61
Vu VV, Labroche N, Bouchon-Meunier B (2012) Improving constrained clustering with active query selection. Pattern Recogn 45(4):1749–1758
Wagstaff KL (2006) When is constrained clustering beneficial, and why. In: AAAI, pp 1–2
Xiong S, Azimi J, Fern X (2014) Active learning of constraints for semi-supervised clustering. IEEE Trans Knowl Data Eng 26(1):43–54
Xu Q, desJardins M, Wagstaff K (2005) Active constrained clustering by examining spectral eigenvectors. In: 8th International conference discovery science, vol 3735, pp 294–307
Yu G, Huang R, Wang Z (2010) Document clustering via Dirichlet process mixture model with feature selection. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 763–772
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, C., Rana, S., Phung, D. et al. Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering. Ann. Data. Sci. 3, 205–223 (2016). https://doi.org/10.1007/s40745-016-0082-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40745-016-0082-z