Skip to main content

A Distance Scaling Method to Improve Density-Based Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Abstract

Density-based clustering is able to find clusters of arbitrary sizes and shapes while effectively separating noise. Despite its advantage over other types of clustering, it is well-known that most density-based algorithms face the same challenge of finding clusters with varied densities. Recently, ReScale, a principled density-ratio preprocessing technique, enables a density-based clustering algorithm to identify clusters with varied densities. However, because the technique is based on one-dimensional scaling, it does not do well in datasets which require multi-dimensional scaling. In this paper, we propose a multi-dimensional scaling method, named DScale, which rescales based on the computed distance. It overcomes the key weakness of ReScale and requires one less parameter while maintaining the simplicity of the implementation. Our empirical evaluation shows that DScale has better clustering performance than ReScale for three existing density-based algorithms, i.e., DBSCAN, OPTICS and DP, on synthetic and real-world datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC Press, Boca Raton (2013)

    MATH  Google Scholar 

  2. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. SIGMOD 1999, pp. 49–60. ACM, New York (1999)

    Google Scholar 

  3. Borg, I., Groenen, P.J., Mair, P.: Applied Multidimensional Scaling. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31848-1

    Book  Google Scholar 

  4. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 47–58. SIAM (2003)

    Chapter  Google Scholar 

  5. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996)

    Google Scholar 

  6. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  7. Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74825-0_7

    Chapter  Google Scholar 

  8. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)

    Book  Google Scholar 

  9. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  10. Pearson, E.S.: The probability integral transformation for testing goodness of fit and combining independent tests of significance. Biometrika 30(1/2), 134–148 (1938)

    Article  Google Scholar 

  11. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  12. Zhu, Y., Ting, K.M., Carman, M.J.: Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn. 60, 983–997 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, Y., Ting, K.M., Angelova, M. (2018). A Distance Scaling Method to Improve Density-Based Clustering. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics