On learning cluster coefficient of private networks

Wang, Yue; Wu, Xintao; Zhu, Jun; Xiang, Yang

doi:10.1007/s13278-013-0127-7

On learning cluster coefficient of private networks

Original Article
Published: 16 July 2013

Volume 3, pages 925–938, (2013)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Yue Wang¹,
Xintao Wu¹,
Jun Zhu¹ &
…
Yang Xiang²

345 Accesses
15 Citations
Explore all metrics

Abstract

Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations $f_1,\ldots,f_m$ connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each f _i with Laplace noise derived from its own sensitivity value and the distributed privacy threshold $\epsilon_i,$ and finally combine those perturbed f _i as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show that the developed divide and conquer approach outperforms the direct approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What Should We Protect? Defining Differential Privacy for Social Network Analysis

dK-Projection: Publishing Graph Joint Degree Distribution with Node Differential Privacy

Boosting the accuracy of differentially private in weighted social networks

Article 02 September 2019

Notes

References

Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election. In: WWW-2005 Workshop on the Weblogging Ecosystem
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286(5439):509–512
Article MathSciNet Google Scholar
Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 273–282
Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th annual ACM symposium on theory of computing. ACM, pp 609–618
Caci B, Cardaci M, Tabacchi ME (2012) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min 2(2):163–167
Article Google Scholar
Chaudhuri K, Monteleoni C (2008) Privacy-preserving logistic regression. In: Proceedings of the twenty-second annual conference on neural information processing systems (NIPS). Citeseer, pp 289–296
Costa LF, Rodrigues FA, Travieso G, Boas PRV (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242
Article Google Scholar
Ding B, Winslett M, Han J, Li Z (2011) Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD conference, pp 217–228
Du W, Teng Z, Zhu Z (2008) Privacy-MaxEnt: integrating background knowledge in privacy quantification. In: ACM SIGMOD
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95
Article Google Scholar
Dwork C, Lei J (2009) Differential privacy and robust statistics. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 371–380
Dwork C, Smith A (2010) Differential privacy for statistics: what we know and what we want to learn. J Priv Confid 1(2):2
Google Scholar
Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006a) Our data, ourselves: privacy via distributed noise generation. In: Advances in cryptology-EUROCRYPT 2006, pp 486–503
Dwork C, McSherry F, Nissim K, Smith A (2006b) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography, pp 265–284
ERDdS P, R & WI A (1959) On random graphs i. Publ Math Debrecen 6:290–297
Google Scholar
Friedman A, Schuster A, (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–502
Ganta SR, Kasiviswanathan SP, Smith A, (2008) Composition attacks and auxiliary information in data privacy. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 265–273
Ghosh A, Roughgarden T, Sundararajan M (2009) Universally utility-maximizing privacy mechanisms. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 351–360
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: ICDM. IEEE, pp 169–178
Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow 3(1)
Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: SIGMOD conference, pp 193–204
Leskovec J (2013) Snap: Stanford network analysis platform
Lee J, Clifton C (2012) Differential identificablity. In: KDD
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE
Li C, Hay M, Rastogi V, Miklau G, McGregor A (2010) Optimizing linear counting queries under differential privacy. In: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems of data. ACM, pp 123–134
Machanavajjhala A, Gehrke J, Keifer D, Venkitasubramanian M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE ICDE conference
Martin DJ, Kifer D, Machanavajjhala A, Gehrke J, Halpern JY (2007) Worst-case background knowledge for privacy-preserving data publishing. In: ICDE
McSherry F, Mironov I (2009) Differentially private recommender systems. In: KDD. ACM
McSherry FD (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data. ACM, pp 19–30
Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing. ACM, pp 75–84
Rastogi V, Hay M, Miklau G, Suciu D (2009) Relationship privacy: output perturbation for queries with joins. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 107–116
Sallaberry A, Zaidi F, Melancon G (2013) Model for generating artificial social networks having community structures with small-world and scale-free properties. Soc Netw Anal Min
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymty and its enforcement through generalization and suppression. In: Proceedings of the IEEE symposium on research in security and privacy
Sarathy R, Muralidhar K (2009) Differential privacy for numeric data. In: Joint UNECE/Eurostat work session on statistical data confidentiality, Bilbao, Spain
Wang Y, Wu X, Zhu J, Xiang Y (2012) On learning cluster coefficient of private networks. In: ASONAM
Watts D, Strogatz S (1998) The small world problem. Collect Dyn Small World Netw 393:440–442
Google Scholar
Watts DJ, Strogatz SH (1998) Collective dynamics of small-worldnetworks. Nat Biotechnol 393(6684):440–442
Article Google Scholar
Wang Y, Wu X, Wu L (2013) Differential privacy preserving spectral graph analysis. In: PAKDD
Xiao X, Wang G, Gehrke J (2010) Differential privacy via wavelet transforms. In: Data engineering (ICDE), 2010 IEEE 26th international conference on. IEEE, pp 225–236
Xiao X, Bender G, Hay M, Gehrke J (2011) ireduct: differential privacy with reduced relative errors. In: SIGMOD conference, pp 229–240
Ying X, Wu X, Wang Y (2013) On linear refinement of differential privacy-preserving query answering. In: PAKDD
Zaidi F (2013) Small world networks and clustered small world networks with random connectivity. Soc Netw Anal Min 3(1):51-63
Article Google Scholar

Download references

Acknowledgments

The conference version of this work was published in Wang et al. (2012). This journal version contains significant extensions including detailed theoretical proofs and extensive empirical evaluations. We would like to thank anonymous reviewers for their invaluable comments. This work was supported in part by U.S. National Science Foundation (0546027, 0831204, 0915059, 1047621), U.S. National Institutes of Health (1R01GM103309-01A1), and the Shanghai Magnolia Science & Technology Talent Fund (11BA1412600).

Author information

Authors and Affiliations

University of North Carolina at Charlotte, Charlotte, NC, USA
Yue Wang, Xintao Wu & Jun Zhu
Tongji University, Shanghai, China
Yang Xiang

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xintao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xintao Wu.

A Proof

Proof of Lemma 1

$$\begin{aligned} & E(u\cdot(a+Lap(0,\, a')) + v\cdot(b+Lap(0,b')))\\ &\quad =E(u\cdot a+v\cdot b) + u\cdot E( Lap(0,a')) +v\cdot E(Lap(0,b')) \end{aligned}$$

Since E(Lap(0, a′)) = 0 and E(Lap(0, b′)) = 0, we have

$$E(u\cdot(a+Lap(0,a'))+v\cdot(b+Lap(0,b'))) =E(u\cdot a+v\cdot b)$$

Proof of Lemma 2

$$\begin{aligned} E((a+Lap(0,a'))\cdot(b+Lap(0,b'))) &=E(a\cdot b+b\cdot Lap(0,a') \\ & \quad+a\cdot Lap(0,b')+Lap(0,a')\cdot Lap(0,b')) \\ &=E(a\cdot b)+b\cdot E(Lap(0,a')) \\ &\quad +a\cdot E(Lap(0,b'))+E(Lap(0,a')\cdot Lap(0,b')) \\ \end{aligned}$$

E(Lap(0, a′)) = E(Lap(0, b′)) = 0; besides, $E(Lap(0,a')\cdot Lap(0,b'))=E(Lap(0,a'))\cdot E(Lap(0,b'))$ since a, b are independently perturbed with Laplace noise; hence

$$E((a+Lap(0,a'))\cdot(b+Lap(0,b'))) =E(a\cdot b)$$

Proof of Lemma 3

$$\begin{aligned} E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}\right) &=E\left(\frac{a}{b}\right)+E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}-\frac{a}{b}\right) \end{aligned}$$

Since E(Lap(0,a′)) = 0 and E(0, Lap(b′)) = 0, we have

$$\begin{aligned} E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}-\frac{a}{b}\right) & = E\left(\frac{(a+Lap(0,a')b-a(b+Lap(0,b')))}{b\cdot(b+Lap(0,b'))}\right) \\ & = E\left(\frac{b\cdot Lap(0,a')}{b\cdot(b+Lap(0,b'))}\right) -E\left(\frac{a\cdot Lap(0,b')}{b\cdot(b+Lap(0,b'))}\right)\\ &=0 \end{aligned}$$

Proof of Result 1

To derive $LS_{C_{i}}^{(s)}(G),$ we first consider the case for s = 0, i.e., $LS_{C_{i}}(G).$

Let G and G′ respectively denote the original graph G and its neighbor graph by deleting an edge from G. For a given node i: let $N_{\Updelta}(i)$ and N ₃(i) denote the attributes of i in G, while $N'_{\Updelta}(i)$ and N′₃(i) denote the same attributes in G′. By definition, we have $0\leq N_{\Updelta}(i) \leq N_{3}(i) = 2/(d_{i}(d_{i}-1)),$ when deleting an edge from $G, N_{\Updelta}(i)$ would be decreased by at most d _i − 1; while N ₃(i) would be decreased by exactly d _i − 1. Therefore,

$$\begin{aligned} LS_{C_{i}}(G) &=\frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}-\frac{N'_{\Updelta}(i)}{d_{i}(d_{i}-1)/2-(d_{i}-1)} \end{aligned}$$

(18)

when $0\leq N_{\Updelta}(i)\leq d_{i}-1,$ we have

$$\begin{aligned} LS_{C_{i}}(G) &\leq \frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}\leq \frac{d_{i}-1}{d_{i}(d_{i}-1)/2}=2/d_{i} \end{aligned}$$

when $d_{i}-1 \leq N_{\Updelta}(i)\leq d_{i}(d_{i}-1)/2 ,$ we have

$$\begin{aligned} LS_{C_{i}}(G) &\leq \frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}-\frac{N_{\Updelta}(i)-(d_{i}-1)}{d_{i}(d_{i}-1)/2-(d_{i}-1)}\\ &= \frac{2}{d_{i}-2}-\frac{4 N_{\Updelta}(i)}{d_{i}(d_{i}-1)(d_{i}-2)} \\ &\leq \frac{2}{d_{i}-2}-\frac{4(d_{i}-1)}{d_{i}(d_{i}-1)(d_{i}-2)}=2/d_{i} \end{aligned}$$

So that $LS_{C_{i}}(G)= \frac{2}{d_{i}}.$

In general case, for s > 0, we have (Eq. 7),

$$LS_{C_{i}}^{(s)}(G)=\max_{G'\in D: d(G,G')\leq s} {LS_{C_{i}}(G')}=\frac{2}{d_{i}-s}$$

for d _i − s > 2; and $LS_{C_{i}}^{(s)}(G)=GS_{C_{i}}(G)=1$ otherwise.

Proof of Lemma 4

Since E(Lap(0,a′)) = 0 and $E(Lap^{2}(0,a'))=a^{'2}, \,\, \tilde{a}=a+Lap(0,a'),$ therefore

$$\begin{aligned} &E((u_1\cdot \tilde{a}+v_1)(u_2\cdot \tilde{a}+v_2)) \\ &\quad=E((u_1\cdot a+v_1)(u_2\cdot a+v_2)) \\ &\qquad +(u_1\cdot(u_2\cdot a+v_2)+u_2\cdot(u_1\cdot a+v_1))E(Lap(0,a^{\prime}))\\ &\qquad +u_1\cdot u_2\cdot E(Lap^2(0,a^{\prime})) \\ &\quad=E((u_1\cdot a+v_1)(u_2\cdot a+v_2))+u_1\cdot u_2\cdot a^{\prime 2} \end{aligned}$$

Proof of Result 3

We first consider the situation of s = 0,

$$\begin{aligned} LS_{C}(G)&=\max_{a_{ij}=1} \bigg\{ \frac{2}{d_{i}}+\frac{2}{d_{j}}+\sum_{a_{ik}a_{jk}=1} \frac{1}{d_{k}(d_{k}-1)/2} \bigg\} \\ &\leq \max_{a_{ij}=1}\left\{\frac{2}{d_{i}}+\frac{2}{d_{j}}+\frac{2(d_{\max}-1)}{d_{k}(d_{k}-1)}\right\} \\ &\leq 2\left(\frac{1}{2}+\frac{1}{2}+\frac{2(d_{\max}-1)}{2*(2-1)}\right) =d_{\max} \end{aligned}$$

LS ^(s)_C (G) = LS _C(G) + s for s ≤ b _ij because we may add one edge to complete a half-built triangle involving edge (i, j) which makes the sensitivity increased by at most one; meanwhile, $LS_{C}^{(s)}(G)=LS_{C}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor$ for s > b _ij because we have to add two edges to form a triangle to make the sensitivity increased by one, after completing all the b _ij half-built triangles involving edge (i, j). Besides, LS ^(s)_C (G) ≤ GS _C(G) = n − 1. So we have Eq. (17).

Proof of Result 4.

In addition to the Proof of Result 2 given in Nissim et al. (2007), $LS_{N_{\Updelta}}^{(s)}(G)=3\cdot\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}$ because each of the two entries corresponding to vertex i and j will be decreased by at most $\max_{i\neq j;\,j\in[n]} {c_{ij}(s)} ,$ when edge (i, j) is deleted from G. Besides, there are $\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}$ other entries whose values will be decreased by one, corresponding to the neighbors in common by vertex i and j.

For N ₃, we first consider the situation of s = 0,

$$\begin{aligned} &LS_{N_{3}}(G)= \max_{a_{ij}=1} \left\{ \frac{d_{i}(d_{i}-1)}{2}-\frac{(d_{i}-1)(d_{i}-2)}{2} + \right. \left.\frac{d_{j}(d_{j}-1)}{2}-\frac{(d_{j}-1)(d_{j}-2)}{2} \right\}\\ &\leq \max_{a_{ij}=1} \{d_{i}-1+d_{j}-1 \} \leq d_{\max}+d_{\rm secondmax}-2 \end{aligned}$$

In general case, $LS_{N_{3}}^{(s)}(G)=LS_{N_{3}}(G)+s$ for s ≤ b _ij and $LS_{N_{3}}^{(s)}(G)=LS_{N_3}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor$ for s > b _ij which are similar to those of LS ^(s)_C (G), and $LS_{N_{3}}^{(s)}(G)\leq GS_{N_{3}}(G)=2n-4$. So we have the form for $LS_{N_{3}}^{(s)}(G).$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Wu, X., Zhu, J. et al. On learning cluster coefficient of private networks. Soc. Netw. Anal. Min. 3, 925–938 (2013). https://doi.org/10.1007/s13278-013-0127-7

Download citation

Received: 03 December 2012
Revised: 07 May 2013
Accepted: 22 June 2013
Published: 16 July 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s13278-013-0127-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On learning cluster coefficient of private networks

Abstract

Access this article

Similar content being viewed by others

What Should We Protect? Defining Differential Privacy for Social Network Analysis

dK-Projection: Publishing Graph Joint Degree Distribution with Node Differential Privacy

Boosting the accuracy of differentially private in weighted social networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

A Proof

Proof of Lemma 1

Proof of Lemma 2

Proof of Lemma 3

Proof of Result 1

Proof of Lemma 4

Proof of Result 3

Proof of Result 4.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On learning cluster coefficient of private networks

Abstract

Access this article

Similar content being viewed by others

What Should We Protect? Defining Differential Privacy for Social Network Analysis

dK-Projection: Publishing Graph Joint Degree Distribution with Node Differential Privacy

Boosting the accuracy of differentially private in weighted social networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

A Proof

A Proof

Proof of Lemma 1

Proof of Lemma 2

Proof of Lemma 3

Proof of Result 1

Proof of Lemma 4

Proof of Result 3

Proof of Result 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation