Skip to main content

Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9794))

Included in the following conference series:

  • 1015 Accesses

Abstract

Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity–location–time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: discovering informative knowledge in complex data. Trans. SMC 41(3), 699–712 (2011)

    Google Scholar 

  3. Do, T.M.T., Gatica-Perez, D.: Human interaction discovery in smartphone proximity networks. Pers. Ubiquit. Comput. 17(3), 413–431 (2013)

    Article  Google Scholar 

  4. Dousse, O., Eberle, J., Mertens, M.: Place learning via direct wifi fingerprint clustering. In: Mobile Data Management (MDM), pp. 282–287. IEEE (2012)

    Google Scholar 

  5. Escobar, M., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  6. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007). http://www.sciencemag.org/content/315/5814/972

    Article  MathSciNet  MATH  Google Scholar 

  7. Huynh, V., Phung, D., Nguyen, L., Venkatesh, S., Bui, H.H.: Learning conditional latent structures from multiple data sources. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9077, pp. 343–354. Springer, Heidelberg (2015)

    Google Scholar 

  8. Kulis, B., Jordan, M.I.: Revisiting k-means: new algorithms via bayesian nonparametrics. In: Proceedings of the ICML (2012)

    Google Scholar 

  9. Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.M.T., Dousse, O., Eberle, J., Miettinen, M., et al.: The mobile data challenge: big data for mobile computing research. In: Pervasive Computing (2012)

    Google Scholar 

  10. Lee, D.D., Seung, H., et al.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  11. Liang, P., Petrov, S., Jordan, M.I., Klein, D.: The infinite PCFG using hierarchical dirichlet processes. In: EMNLP 2007, pp. 688–697 (2007)

    Google Scholar 

  12. Liu, J.: The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Am. Stat. Assoc. 89, 958–966 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  13. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)

    MATH  Google Scholar 

  14. Nguyen, T.C., Phung, D., Gupta, S., Venkatesh, S.: Extraction of latent patterns and contexts from social honest signals using hierarchical Dirichlet processes. In: PERCOM, pp. 47–55 (2013)

    Google Scholar 

  15. Nguyen, T.B., Nguyen, T.C., Luo, W., Venkatesh, S., Phung, D.: Unsupervised inference of significant locations from wifi data for understanding human dynamics. In: Proceedings of MUM 2014, pp. 232–235 (2014)

    Google Scholar 

  16. Nguyen, T., Phung, D., Venkatesh, S., Nguyen, X., Bui, H.: Bayesian nonparametric multilevel clustering with group-level contexts. In: ICML, pp. 288–296 (2014)

    Google Scholar 

  17. Nguyen, V., Phung, D., Venkatesh, S., Bui, H.H.: A Bayesian nonparametric approach to multilevel regression. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9077, pp. 330–342. Springer, Heidelberg (2015)

    Google Scholar 

  18. Pentland, A.: Automatic mapping and modeling of human networks. Phys. A: Stat. Mech. Appl. 378(1), 59–67 (2007)

    Article  Google Scholar 

  19. Phung, D., Nguyen, X., Bui, H., Nguyen, T., Venkatesh, S.: Conditionally dependent Dirichlet processes for modelling naturally correlated data sources. Technical report, Pattern Recognition and Data Analytics, Deakin University (2012)

    Google Scholar 

  20. Ren, L., Dunson, D.B., Carin, L.: The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML 2008, pp. 824–831. ACM, New York (2008)

    Google Scholar 

  21. Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Netw. 8(5), 22–32 (1994)

    Article  Google Scholar 

  22. Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhang, J., Song, Y., Zhang, C., Liu, S.: Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In: SIGKDD, pp. 1079–1088 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanh-Binh Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, TB., Nguyen, V., Venkatesh, S., Phung, D. (2016). Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes. In: Cao, H., Li, J., Wang, R. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9794. Springer, Cham. https://doi.org/10.1007/978-3-319-42996-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42996-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42995-3

  • Online ISBN: 978-3-319-42996-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics