Elsevier

Future Generation Computer Systems

Volume 87, October 2018, Pages 341-350
Future Generation Computer Systems

Privacy-preserving machine learning with multiple data providers

https://doi.org/10.1016/j.future.2018.04.076Get rights and content

Highlights

  • To protect data privacy, multiple parties encrypt their data under their own public key of double decryption algorithm, before outsourcing it to cloud for storing and processing.

  • To improve the efficiency and accuracy of the computation, cloud transforms the encrypted data into noised data, such that the machine learning algorithm can be executed on this noised data with ϵ-differential privacy.

  • The proposed scheme is proven to be secure in the security model.

Abstract

With the fast development of cloud computing, more and more data storage and computation are moved from the local to the cloud, especially the applications of machine learning and data analytics. However, the cloud servers are run by a third party and cannot be fully trusted by users. As a result, how to perform privacy-preserving machine learning over cloud data from different data providers becomes a challenge. Therefore, in this paper, we propose a novel scheme that protects the data sets of different providers and the data sets of cloud. To protect the privacy requirement of different providers, we use public-key encryption with a double decryption algorithm (DD-PKE) to encrypt their data sets with different public keys. To protect the privacy of data sets on the cloud, we use ϵ-differential privacy. Furthermore, the noises for the ϵ-differential privacy are added by the cloud server, instead of data providers, for different data analytics. Our scheme is proven to be secure in the security model. The experiments also demonstrate the efficiency of our protocol with different classical machine learning algorithms.

Introduction

With the fast development of cloud computing, more and more data and applications are moved from the local to cloud servers, including machine learning and other data analytics. However, the cloud computing platform cannot be fully trusted because it is run by a third party. Cloud users lose the control of their data after outsourcing their data to the cloud. To protect the privacy, the data are usually encrypted before they are uploaded to the cloud storage. However, the encryption techniques render the data utilization difficult.

Though there are some traditional techniques such as homomorphic cryptographic techniques to provide solutions for the data utilization over encrypted data, they are inefficient in practice. To address this challenge, another important notion of differential privacy has been proposed. It can not only protect the privacy, but also provides efficient data operations.

However, most of the previous mainly focus on the data from a single user. It is common that the data always from different data providers for machine learning. Therefore, how to perform machine learning over cloud data from multiple users become a new challenge. Traditional differential privacy technique and encryption methods are not practical for this environment. On one hand, the data from different users are encrypted with different public keys or noises, which makes the computation be difficult. On the other hand, data have to be proceeded in different ways for different applications, which makes both the communication overhead and computation overhead be huge.

Main idea. To tackle the above challenges, we propose a scheme named privacy-preserving machine learning under multiple keys (PMLM) to solve this problem. Since the secure multi-party computation (SMC) only supports the computation on the data encrypted under the same public key and the efficiency and accuracy of the computation need to be improved. Therefore, our PMLM scheme as an efficient solution is required that conducts the data encrypted under different public keys for different data providers and improves the efficiency and accuracy. Our novel technique based on a new public-key encryption with a double decryption algorithm (DD-PKE) and differential privacy. The DD-PKE is additively homomorphic scheme and holds two independent decryption algorithms which allows the outsourced data set to be transformed into randomized data. The differential privacy can be used to add statistical noises to the outsourced data set for data analyses and data computations.

Our PMLM scheme works as follows. First, we set up a public-key encryption with a double decryption algorithm (DD-PKE) to protect the data privacy of multiple data providers. During this phase, we do not take the differential privacy protection into consideration. We then use a cloud server to add different statistical noises to outsourced ciphertexts according to the different applications of the data analyst, and these noises are encrypted under a public key corresponding to the outsourced ciphertexts. Finally, the data analyst downloads this noise-added ciphertext data sets, decrypts it using his or her own master key and performs a machine learning task over this joint distribution with minimum error.

Our Contributions. In our PMLM scheme, we assume that the cloud server and data analyst are not collude with each other and that they are semi-honest. In all steps of PMLM scheme, the multiple users do not interact with each other. We show that our PMLM scheme is IND-CCA secure in the random oracle model.

In particular, the main contributions of this work are summarized as follows:

  • In this work, the cloud server has the authority to add different statistical noises to the outsourced data set according to different queries of the data analyst rather than the data providers adding statistical noise by themselves with only one application.

  • We use a DD-PKE cryptosystem to preserve the privacy of the data providers’ data sets, which can be used to transform the encrypted data into a randomized data set without information leakage.

  • In our PMLM scheme, the machine learning task is performed on a randomized data set with ϵ-differential privacy rather than on the encrypted data set. This process improves the computational efficiency and data analysis accuracy.

Organization of the Paper. The remainder of this paper is organized as follows. Section 2 provides a literature review over privacy-preserving machine learning based on differential privacy protection. Section 3 presents some notations and definitions on cryptographic primitives and differential privacy. In Section 4, we present the system model, the problem statement and the adversary model. In Section 5, we provide the PMLM scheme. Then, we present our simulation results in Section 6 and the security analysis in Section 7. Finally, the conclusions and directions for future work are presented in Section 8.

Section snippets

Related work

Machine learning is the process of programming computers to optimize a performance criterion using example data or prior experience. Because of its powerful ability to process large amounts of data, machine learning has been applied in various fields in recent years, including speaker recognition [1], image recognition [[2], [3]] and signal processing [4]. To protect the data privacy in the machine learning model, two well-known lines of research should be considered in our work.

Preliminaries

In this section, we present some notations, cryptographic primitives and differential privacy that will be used throughout this paper.

System and adversary models

In this section, we present the definitions of our system model, problem statement and the adversary model.

Our solution

In this section, we first present the main steps of our solution. We then describe in detail the construction of our solution based on the DD-PKE cryptosystem Π1=(Setup,KeyGen,Enc,uDec,mDec) and ϵ-DP. The DD-PKE cryptosystem Π1 is based on the application of basic scheme in [31] to achieve CCA security in the random oracle model [32] by using the generic transformation proposed in [33].

  • 1.

    Initialization. In this step, DA runs a Setup algorithm to set up the

Simulation results

In this section, we show how we use our scheme to preserve data privacy according to the DD-PKE cryptosystem Π1 and ϵ-DP. On the one hand, the performance of DD-PKE cryptosystem Π1 is conducted on a PC with an Intel(R) Core(TM) i7-6500U CPU with 2.59 GHz and 8 GB of RAM. To perform the cryptosystem Π1, all programs are built in MAGMA. We firstly choose a security parameter to test the operations in cryptosystem Π1. Secondly, we randomly choose two primes p and q from the interval (2κ1+1,2κ1)

Security analysis

In this section, we first present the security analysis of the basic cryptographic encryption primitive and ϵ-DP before analyzing the security of our PMLM scheme.

Conclusion and future work

In this paper, we proposed PMLM, a scheme for privacy-preserving machine learning under multiple keys, which allows multiple data providers to outsource encrypted data sets to a cloud server for data storing and processing. In our work, the cloud server can add different statistical noises to the outsourced data sets according to the different queries of the data analyst, which is different from existing works (i.e., data providers add statistical noise by themselves). Our work is mainly based

Acknowledgments

This work was supported by Natural Science Foundation of Guangdong Province for Distinguished Young Scholars, China (2014A030306020), Guangzhou scholars project for universities of Guangzhou, China (No. 1201561613), Science and Technology Planning Project of Guangdong Province, China (2015B010129015), National Natural Science Foundation of China (No. 61472091, No. 61702126) and National Natural Science Foundation for Outstanding Youth Foundation , China (No. 61722203).

Ping Li received the M.S. and Ph.D. degree in applied mathematics from Sun Yat-sen University in 2011 and 2016, respectively. She is currently a postdoc under supervisor Jin Li in School of Computer Science and Educational Software, Guangzhou University. Her current research interests include cryptography, privacy-preserving and cloud computing.

References (35)

  • R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, Cryptonets: Applying neural networks to...
  • BosJ.W. et al.

    Improved security for a ring-based fully homomorphic encryption scheme

  • E. Hesamifard, H. Takabi, M. Ghasemi, CryptoDL: Deep neural networks over encrypted data, 2017. ArXiv preprint...
  • GaoC. et al.

    Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack

    Inform. Sci.

    (2018)
  • LiP. et al.

    Privacy-preserving outsourced classification in cloud computing

    Cluster Comput.

    (2017)
  • ChenX. et al.

    Verifiable computation over large database with incremental updates

    IEEE Trans. Comput.

    (2016)
  • ChenX. et al.

    New algorithms for secure outsourcing of large-scale systems of linear equations

    IEEE Trans. Inf. Forensics Secur.

    (2015)
  • Cited by (0)

    Ping Li received the M.S. and Ph.D. degree in applied mathematics from Sun Yat-sen University in 2011 and 2016, respectively. She is currently a postdoc under supervisor Jin Li in School of Computer Science and Educational Software, Guangzhou University. Her current research interests include cryptography, privacy-preserving and cloud computing.

    Tong Li received his B.S. (2011) and M.S. (2014) from Taiyuan University of Technology and Beijing University of Technology, respectively, both in Computer Science Technology. Currently, he is a Ph.D. candidate at Nankai University. His research interests include applied cryptography and data privacy protection in cloud computing.

    Heng Ye received his B.S. (2011) from Beijing Jiaotong university in computer science & technology. Now he is a Ph.D. candidate at Beijing Jiaotong University since 2016. His research interests include differential privacy, attribute based encryption and internet of things.

    Jin Li received the B.S. degree in mathematics from Southwest University in 2002 and the Ph.D. degree in information security from Sun Yat-sen University in 2007. Currently, he works at Guangzhou University as a professor. He has been selected as one of science and technology new star in Guangdong province. His research interests include applied cryptography and security in cloud computing. He has published more than 70 research papers in refereed international conferences and journals and has served as the program chair or program committee member in many international conferences.

    Xiaofeng Chen (SM16) received the B.S. and M.S. degrees in mathematics from Northwest University, China, in 1998 and 2000, respectively, and the Ph.D. degree in cryptography from Xidian University in 2003. He is currently with Xidian University as a Professor. He has authored or co-authored over 100 research papers in refereed international conferences and journals. His current research interests include applied cryptography and cloud computing security. His work has been cited over 3000 times in Google Scholar. He is on the Editorial Board of Security and Communication Networks, Computing and Informatics, and the International Journal of Embedded Systems. He has served as the Program/ General Chair or Program Committee Member in over 30 international conferences.

    Yang Xiang (A’08-M’09-SM’12) received the Ph.D. degree in computer science from Deakin University, Melbourne, Australia. He is currently a Full Professor with the School of Information Technology, Deakin University. He is the Director of the Network Security and Computing Lab (NSCLab) and the Associate Head of School (Industry Engagement). He is the Chief Investigator of several projects in network and system security, funded by the Australian Research Council (ARC). He has published more than 150 research papers in many international journals and conferences. Two of his papers were selected as the featured articles in the April 2009 and the July 2013 issues of the IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. He has published two books, Software Similarity and Classification (Springer, 2012) and Dynamic and Advanced Data Mining for Progressing Technological Development (IGI-Global, 2009). His research interests include network and system security, distributed systems, and networking. Prof. Xiang has served as the Program/General Chair for many international conferences. He serves as an Associate Editor of the IEEE TRANSACTIONS ON COMPUTERS, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, and Security and Communication Networks, and an Editor of the Journal of Network and Computer Applications. He is the Coordinator, Asia, for the IEEE Computer Society Technical Committee on Distributed Processing (TCDP).

    View full text