Abstract
Privacy-preserving data mining aims to keep data safe, yet useful. But algorithms providing strong guarantees often end up with low utility. We propose a novel privacy preserving framework that thwarts an adversary from inferring an unknown data point by ensuring that the estimation error is almost invariant to the inclusion/exclusion of the data point. By focusing directly on the estimation error of the data point, our framework is able to significantly lower the perturbation required. We use this framework to propose a new privacy aware K-means clustering algorithm. Using both synthetic and real datasets, we demonstrate that the utility of this algorithm is almost equal to that of the unperturbed K-means, and at strict privacy levels, almost twice as good as compared to the differential privacy counterpart.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
available at URL https://archive.ics.uci.edu/ml/datasets.html.
References
Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM SIGMOD Rec. 29(2), 439–450 (2000). ACM
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Ciriani, V., di Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining. Advances in Database Systems, vol. 34, pp. 105–136. Springer, US (2008)
Malik, M.B., Ghazi, M.A., Ali, R.: Privacy preserving data mining techniques: current scenario and future prospects. In: ICCCT 2012, pp. 26–32. IEEE (2012)
Begelman, G., Keller, P., Smadja, F., et al.: Automated tag clustering: improving search and exploration in the tag space. In: Collaborative Web Tagging Workshop at WWW2006, pp. 15–33 (2006)
Fred, A.L., Jain, A.K.: Data clustering using evidence accumulation. In: ICPR 2002, vol. 4, pp. 276–280. IEEE (2002)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: ACM SIGIR 2004, pp. 210–217 (2004)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: KDD 2003, pp. 206–215. ACM (2003)
Inan, A., Kaya, S.V., Saygın, Y., Savaş, E., Hintoğlu, A.A., Levi, A.: Privacy preserving clustering on horizontally partitioned data. Data Knowl. Eng. 63(3), 646–666 (2007)
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: KDD 2005, pp. 593–599. ACM (2005)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: NIPS 2009, pp. 289–296 (2009)
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N.: A practical differentially private random decision tree classifier. In: ICDMW 2009, pp. 114–121. IEEE (2009)
Hua, J., Xia, C., Zhong, S.: Differentially private matrix factorization. In: IJCAI (2015)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the sulq framework. In: PODS 2005, pp. 128–138. ACM (2005)
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: ACM SIGMOD International Conference on Management of Data (2009)
Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private \(k\)-means clustering. CoRR, abs/1504.05998 (2015)
Rana, S., Gupta, S., Venkatesh, S.: Differentially private random forest with high utility. In: IEEE International Conference on Data Mining (2015)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Salibian-Barrera, M., Zamar, R.H.: Bootstrapping robust estimates of regression. Ann. Stat. 30, 556–582 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, T.D., Gupta, S., Rana, S., Venkatesh, S. (2016). Privacy Aware K-Means Clustering with High Utility. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)