Abstract
Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations \(f_1,\ldots,f_m\) connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each f i with Laplace noise derived from its own sensitivity value and the distributed privacy threshold \(\epsilon_i,\) and finally combine those perturbed f i as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show that the developed divide and conquer approach outperforms the direct approach.
Similar content being viewed by others
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election. In: WWW-2005 Workshop on the Weblogging Ecosystem
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286(5439):509–512
Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 273–282
Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th annual ACM symposium on theory of computing. ACM, pp 609–618
Caci B, Cardaci M, Tabacchi ME (2012) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min 2(2):163–167
Chaudhuri K, Monteleoni C (2008) Privacy-preserving logistic regression. In: Proceedings of the twenty-second annual conference on neural information processing systems (NIPS). Citeseer, pp 289–296
Costa LF, Rodrigues FA, Travieso G, Boas PRV (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242
Ding B, Winslett M, Han J, Li Z (2011) Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD conference, pp 217–228
Du W, Teng Z, Zhu Z (2008) Privacy-MaxEnt: integrating background knowledge in privacy quantification. In: ACM SIGMOD
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95
Dwork C, Lei J (2009) Differential privacy and robust statistics. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 371–380
Dwork C, Smith A (2010) Differential privacy for statistics: what we know and what we want to learn. J Priv Confid 1(2):2
Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006a) Our data, ourselves: privacy via distributed noise generation. In: Advances in cryptology-EUROCRYPT 2006, pp 486–503
Dwork C, McSherry F, Nissim K, Smith A (2006b) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography, pp 265–284
ERDdS P, R & WI A (1959) On random graphs i. Publ Math Debrecen 6:290–297
Friedman A, Schuster A, (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–502
Ganta SR, Kasiviswanathan SP, Smith A, (2008) Composition attacks and auxiliary information in data privacy. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 265–273
Ghosh A, Roughgarden T, Sundararajan M (2009) Universally utility-maximizing privacy mechanisms. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 351–360
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: ICDM. IEEE, pp 169–178
Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow 3(1)
Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: SIGMOD conference, pp 193–204
Leskovec J (2013) Snap: Stanford network analysis platform
Lee J, Clifton C (2012) Differential identificablity. In: KDD
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE
Li C, Hay M, Rastogi V, Miklau G, McGregor A (2010) Optimizing linear counting queries under differential privacy. In: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems of data. ACM, pp 123–134
Machanavajjhala A, Gehrke J, Keifer D, Venkitasubramanian M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE ICDE conference
Martin DJ, Kifer D, Machanavajjhala A, Gehrke J, Halpern JY (2007) Worst-case background knowledge for privacy-preserving data publishing. In: ICDE
McSherry F, Mironov I (2009) Differentially private recommender systems. In: KDD. ACM
McSherry FD (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data. ACM, pp 19–30
Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing. ACM, pp 75–84
Rastogi V, Hay M, Miklau G, Suciu D (2009) Relationship privacy: output perturbation for queries with joins. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 107–116
Sallaberry A, Zaidi F, Melancon G (2013) Model for generating artificial social networks having community structures with small-world and scale-free properties. Soc Netw Anal Min
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymty and its enforcement through generalization and suppression. In: Proceedings of the IEEE symposium on research in security and privacy
Sarathy R, Muralidhar K (2009) Differential privacy for numeric data. In: Joint UNECE/Eurostat work session on statistical data confidentiality, Bilbao, Spain
Wang Y, Wu X, Zhu J, Xiang Y (2012) On learning cluster coefficient of private networks. In: ASONAM
Watts D, Strogatz S (1998) The small world problem. Collect Dyn Small World Netw 393:440–442
Watts DJ, Strogatz SH (1998) Collective dynamics of small-worldnetworks. Nat Biotechnol 393(6684):440–442
Wang Y, Wu X, Wu L (2013) Differential privacy preserving spectral graph analysis. In: PAKDD
Xiao X, Wang G, Gehrke J (2010) Differential privacy via wavelet transforms. In: Data engineering (ICDE), 2010 IEEE 26th international conference on. IEEE, pp 225–236
Xiao X, Bender G, Hay M, Gehrke J (2011) ireduct: differential privacy with reduced relative errors. In: SIGMOD conference, pp 229–240
Ying X, Wu X, Wang Y (2013) On linear refinement of differential privacy-preserving query answering. In: PAKDD
Zaidi F (2013) Small world networks and clustered small world networks with random connectivity. Soc Netw Anal Min 3(1):51-63
Acknowledgments
The conference version of this work was published in Wang et al. (2012). This journal version contains significant extensions including detailed theoretical proofs and extensive empirical evaluations. We would like to thank anonymous reviewers for their invaluable comments. This work was supported in part by U.S. National Science Foundation (0546027, 0831204, 0915059, 1047621), U.S. National Institutes of Health (1R01GM103309-01A1), and the Shanghai Magnolia Science & Technology Talent Fund (11BA1412600).
Author information
Authors and Affiliations
Corresponding author
A Proof
A Proof
Proof of Lemma 1
Since E(Lap(0, a′)) = 0 and E(Lap(0, b′)) = 0, we have
Proof of Lemma 2
E(Lap(0, a′)) = E(Lap(0, b′)) = 0; besides, \(E(Lap(0,a')\cdot Lap(0,b'))=E(Lap(0,a'))\cdot E(Lap(0,b'))\) since a, b are independently perturbed with Laplace noise; hence
Proof of Lemma 3
Since E(Lap(0,a′)) = 0 and E(0, Lap(b′)) = 0, we have
Proof of Result 1
To derive \(LS_{C_{i}}^{(s)}(G),\) we first consider the case for s = 0, i.e., \(LS_{C_{i}}(G).\)
Let G and G′ respectively denote the original graph G and its neighbor graph by deleting an edge from G. For a given node i: let \(N_{\Updelta}(i)\) and N 3(i) denote the attributes of i in G, while \(N'_{\Updelta}(i)\) and N′3(i) denote the same attributes in G′. By definition, we have \(0\leq N_{\Updelta}(i) \leq N_{3}(i) = 2/(d_{i}(d_{i}-1)),\) when deleting an edge from \(G, N_{\Updelta}(i)\) would be decreased by at most d i − 1; while N 3(i) would be decreased by exactly d i − 1. Therefore,
when \(0\leq N_{\Updelta}(i)\leq d_{i}-1,\) we have
when \(d_{i}-1 \leq N_{\Updelta}(i)\leq d_{i}(d_{i}-1)/2 ,\) we have
So that \(LS_{C_{i}}(G)= \frac{2}{d_{i}}.\)
In general case, for s > 0, we have (Eq. 7),
for d i − s > 2; and \(LS_{C_{i}}^{(s)}(G)=GS_{C_{i}}(G)=1\) otherwise.
Proof of Lemma 4
Since E(Lap(0,a′)) = 0 and \(E(Lap^{2}(0,a'))=a^{'2}, \,\, \tilde{a}=a+Lap(0,a'),\) therefore
Proof of Result 3
We first consider the situation of s = 0,
LS (s) C (G) = LS C (G) + s for s ≤ b ij because we may add one edge to complete a half-built triangle involving edge (i, j) which makes the sensitivity increased by at most one; meanwhile, \(LS_{C}^{(s)}(G)=LS_{C}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor\) for s > b ij because we have to add two edges to form a triangle to make the sensitivity increased by one, after completing all the b ij half-built triangles involving edge (i, j). Besides, LS (s) C (G) ≤ GS C (G) = n − 1. So we have Eq. (17).
Proof of Result 4.
In addition to the Proof of Result 2 given in Nissim et al. (2007), \(LS_{N_{\Updelta}}^{(s)}(G)=3\cdot\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}\) because each of the two entries corresponding to vertex i and j will be decreased by at most \(\max_{i\neq j;\,j\in[n]} {c_{ij}(s)} ,\) when edge (i, j) is deleted from G. Besides, there are \(\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}\) other entries whose values will be decreased by one, corresponding to the neighbors in common by vertex i and j.
For N 3, we first consider the situation of s = 0,
In general case, \(LS_{N_{3}}^{(s)}(G)=LS_{N_{3}}(G)+s\) for s ≤ b ij and \(LS_{N_{3}}^{(s)}(G)=LS_{N_3}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor\) for s > b ij which are similar to those of LS (s) C (G), and \(LS_{N_{3}}^{(s)}(G)\leq GS_{N_{3}}(G)=2n-4\). So we have the form for \(LS_{N_{3}}^{(s)}(G).\)
Rights and permissions
About this article
Cite this article
Wang, Y., Wu, X., Zhu, J. et al. On learning cluster coefficient of private networks. Soc. Netw. Anal. Min. 3, 925–938 (2013). https://doi.org/10.1007/s13278-013-0127-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-013-0127-7