Skip to main content
Log in

On learning cluster coefficient of private networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations \(f_1,\ldots,f_m\) connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each f i with Laplace noise derived from its own sensitivity value and the distributed privacy threshold \(\epsilon_i,\) and finally combine those perturbed f i as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show that the developed divide and conquer approach outperforms the direct approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://www.cs.cmu.edu/~enron/.

  2. http://www.personal.umich.edu/~mejn/netdata/.

  3. http://aimlab.cs.uoregon.edu/smash/.

  4. https://gephi.org/.

References

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election. In: WWW-2005 Workshop on the Weblogging Ecosystem

  • Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 273–282

  • Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138

  • Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th annual ACM symposium on theory of computing. ACM, pp 609–618

  • Caci B, Cardaci M, Tabacchi ME (2012) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min 2(2):163–167

    Article  Google Scholar 

  • Chaudhuri K, Monteleoni C (2008) Privacy-preserving logistic regression. In: Proceedings of the twenty-second annual conference on neural information processing systems (NIPS). Citeseer, pp 289–296

  • Costa LF, Rodrigues FA, Travieso G, Boas PRV (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242

    Article  Google Scholar 

  • Ding B, Winslett M, Han J, Li Z (2011) Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD conference, pp 217–228

  • Du W, Teng Z, Zhu Z (2008) Privacy-MaxEnt: integrating background knowledge in privacy quantification. In: ACM SIGMOD

  • Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95

    Article  Google Scholar 

  • Dwork C, Lei J (2009) Differential privacy and robust statistics. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 371–380

  • Dwork C, Smith A (2010) Differential privacy for statistics: what we know and what we want to learn. J Priv Confid 1(2):2

    Google Scholar 

  • Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006a) Our data, ourselves: privacy via distributed noise generation. In: Advances in cryptology-EUROCRYPT 2006, pp 486–503

  • Dwork C, McSherry F, Nissim K, Smith A (2006b) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography, pp 265–284

  • ERDdS P, R & WI A (1959) On random graphs i. Publ Math Debrecen 6:290–297

    Google Scholar 

  • Friedman A, Schuster A, (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–502

  • Ganta SR, Kasiviswanathan SP, Smith A, (2008) Composition attacks and auxiliary information in data privacy. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 265–273

  • Ghosh A, Roughgarden T, Sundararajan M (2009) Universally utility-maximizing privacy mechanisms. In: Proceedings of the 41st annual ACM symposium on theory of computing. ACM, pp 351–360

  • Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: ICDM. IEEE, pp 169–178

  • Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow 3(1)

  • Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: SIGMOD conference, pp 193–204

  • Leskovec J (2013) Snap: Stanford network analysis platform

  • Lee J, Clifton C (2012) Differential identificablity. In: KDD

  • Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE

  • Li C, Hay M, Rastogi V, Miklau G, McGregor A (2010) Optimizing linear counting queries under differential privacy. In: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems of data. ACM, pp 123–134

  • Machanavajjhala A, Gehrke J, Keifer D, Venkitasubramanian M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE ICDE conference

  • Martin DJ, Kifer D, Machanavajjhala A, Gehrke J, Halpern JY (2007) Worst-case background knowledge for privacy-preserving data publishing. In: ICDE

  • McSherry F, Mironov I (2009) Differentially private recommender systems. In: KDD. ACM

  • McSherry FD (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data. ACM, pp 19–30

  • Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing. ACM, pp 75–84

  • Rastogi V, Hay M, Miklau G, Suciu D (2009) Relationship privacy: output perturbation for queries with joins. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 107–116

  • Sallaberry A, Zaidi F, Melancon G (2013) Model for generating artificial social networks having community structures with small-world and scale-free properties. Soc Netw Anal Min

  • Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymty and its enforcement through generalization and suppression. In: Proceedings of the IEEE symposium on research in security and privacy

  • Sarathy R, Muralidhar K (2009) Differential privacy for numeric data. In: Joint UNECE/Eurostat work session on statistical data confidentiality, Bilbao, Spain

  • Wang Y, Wu X, Zhu J, Xiang Y (2012) On learning cluster coefficient of private networks. In: ASONAM

  • Watts D, Strogatz S (1998) The small world problem. Collect Dyn Small World Netw 393:440–442

    Google Scholar 

  • Watts DJ, Strogatz SH (1998) Collective dynamics of small-worldnetworks. Nat Biotechnol 393(6684):440–442

    Article  Google Scholar 

  • Wang Y, Wu X, Wu L (2013) Differential privacy preserving spectral graph analysis. In: PAKDD

  • Xiao X, Wang G, Gehrke J (2010) Differential privacy via wavelet transforms. In: Data engineering (ICDE), 2010 IEEE 26th international conference on. IEEE, pp 225–236

  • Xiao X, Bender G, Hay M, Gehrke J (2011) ireduct: differential privacy with reduced relative errors. In: SIGMOD conference, pp 229–240

  • Ying X, Wu X, Wang Y (2013) On linear refinement of differential privacy-preserving query answering. In: PAKDD

  • Zaidi F (2013) Small world networks and clustered small world networks with random connectivity. Soc Netw Anal Min 3(1):51-63

    Article  Google Scholar 

Download references

Acknowledgments

The conference version of this work was published in Wang et al. (2012). This journal version contains significant extensions including detailed theoretical proofs and extensive empirical evaluations. We would like to thank anonymous reviewers for their invaluable comments. This work was supported in part by U.S. National Science Foundation (0546027, 0831204, 0915059, 1047621), U.S. National Institutes of Health (1R01GM103309-01A1), and the Shanghai Magnolia Science & Technology Talent Fund (11BA1412600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xintao Wu.

A Proof

A Proof

Proof of Lemma 1

$$\begin{aligned} & E(u\cdot(a+Lap(0,\, a')) + v\cdot(b+Lap(0,b')))\\ &\quad =E(u\cdot a+v\cdot b) + u\cdot E( Lap(0,a')) +v\cdot E(Lap(0,b')) \end{aligned}$$

Since E(Lap(0, a′)) = 0 and E(Lap(0, b′)) = 0, we have

$$E(u\cdot(a+Lap(0,a'))+v\cdot(b+Lap(0,b'))) =E(u\cdot a+v\cdot b)$$

Proof of Lemma 2

$$\begin{aligned} E((a+Lap(0,a'))\cdot(b+Lap(0,b'))) &=E(a\cdot b+b\cdot Lap(0,a') \\ & \quad+a\cdot Lap(0,b')+Lap(0,a')\cdot Lap(0,b')) \\ &=E(a\cdot b)+b\cdot E(Lap(0,a')) \\ &\quad +a\cdot E(Lap(0,b'))+E(Lap(0,a')\cdot Lap(0,b')) \\ \end{aligned}$$

E(Lap(0, a′)) = E(Lap(0, b′)) = 0; besides, \(E(Lap(0,a')\cdot Lap(0,b'))=E(Lap(0,a'))\cdot E(Lap(0,b'))\) since ab are independently perturbed with Laplace noise; hence

$$E((a+Lap(0,a'))\cdot(b+Lap(0,b'))) =E(a\cdot b)$$

Proof of Lemma 3

$$\begin{aligned} E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}\right) &=E\left(\frac{a}{b}\right)+E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}-\frac{a}{b}\right) \end{aligned}$$

Since E(Lap(0,a′)) = 0 and E(0, Lap(b′)) = 0, we have

$$\begin{aligned} E\left(\frac{a+Lap(0,a')}{b+Lap(0,b')}-\frac{a}{b}\right) & = E\left(\frac{(a+Lap(0,a')b-a(b+Lap(0,b')))}{b\cdot(b+Lap(0,b'))}\right) \\ & = E\left(\frac{b\cdot Lap(0,a')}{b\cdot(b+Lap(0,b'))}\right) -E\left(\frac{a\cdot Lap(0,b')}{b\cdot(b+Lap(0,b'))}\right)\\ &=0 \end{aligned}$$

Proof of Result 1

To derive \(LS_{C_{i}}^{(s)}(G),\) we first consider the case for s = 0, i.e., \(LS_{C_{i}}(G).\)

Let G and G′ respectively denote the original graph G and its neighbor graph by deleting an edge from G. For a given node i: let \(N_{\Updelta}(i)\) and N 3(i) denote the attributes of i in G, while \(N'_{\Updelta}(i)\) and N3(i) denote the same attributes in G′. By definition, we have \(0\leq N_{\Updelta}(i) \leq N_{3}(i) = 2/(d_{i}(d_{i}-1)),\) when deleting an edge from \(G, N_{\Updelta}(i)\) would be decreased by at most d i  − 1; while N 3(i) would be decreased by exactly d i  − 1. Therefore,

$$\begin{aligned} LS_{C_{i}}(G) &=\frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}-\frac{N'_{\Updelta}(i)}{d_{i}(d_{i}-1)/2-(d_{i}-1)} \end{aligned}$$
(18)

when \(0\leq N_{\Updelta}(i)\leq d_{i}-1,\) we have

$$\begin{aligned} LS_{C_{i}}(G) &\leq \frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}\leq \frac{d_{i}-1}{d_{i}(d_{i}-1)/2}=2/d_{i} \end{aligned}$$

when \(d_{i}-1 \leq N_{\Updelta}(i)\leq d_{i}(d_{i}-1)/2 ,\) we have

$$\begin{aligned} LS_{C_{i}}(G) &\leq \frac{N_{\Updelta}(i)}{d_{i}(d_{i}-1)/2}-\frac{N_{\Updelta}(i)-(d_{i}-1)}{d_{i}(d_{i}-1)/2-(d_{i}-1)}\\ &= \frac{2}{d_{i}-2}-\frac{4 N_{\Updelta}(i)}{d_{i}(d_{i}-1)(d_{i}-2)} \\ &\leq \frac{2}{d_{i}-2}-\frac{4(d_{i}-1)}{d_{i}(d_{i}-1)(d_{i}-2)}=2/d_{i} \end{aligned}$$

So that \(LS_{C_{i}}(G)= \frac{2}{d_{i}}.\)

In general case, for s > 0, we have (Eq. 7),

$$LS_{C_{i}}^{(s)}(G)=\max_{G'\in D: d(G,G')\leq s} {LS_{C_{i}}(G')}=\frac{2}{d_{i}-s}$$

for d i  − s > 2; and \(LS_{C_{i}}^{(s)}(G)=GS_{C_{i}}(G)=1\) otherwise.

Proof of Lemma 4

Since E(Lap(0,a′)) = 0 and \(E(Lap^{2}(0,a'))=a^{'2}, \,\, \tilde{a}=a+Lap(0,a'),\) therefore

$$\begin{aligned} &E((u_1\cdot \tilde{a}+v_1)(u_2\cdot \tilde{a}+v_2)) \\ &\quad=E((u_1\cdot a+v_1)(u_2\cdot a+v_2)) \\ &\qquad +(u_1\cdot(u_2\cdot a+v_2)+u_2\cdot(u_1\cdot a+v_1))E(Lap(0,a^{\prime}))\\ &\qquad +u_1\cdot u_2\cdot E(Lap^2(0,a^{\prime})) \\ &\quad=E((u_1\cdot a+v_1)(u_2\cdot a+v_2))+u_1\cdot u_2\cdot a^{\prime 2} \end{aligned}$$

Proof of Result 3

We first consider the situation of s = 0,

$$\begin{aligned} LS_{C}(G)&=\max_{a_{ij}=1} \bigg\{ \frac{2}{d_{i}}+\frac{2}{d_{j}}+\sum_{a_{ik}a_{jk}=1} \frac{1}{d_{k}(d_{k}-1)/2} \bigg\} \\ &\leq \max_{a_{ij}=1}\left\{\frac{2}{d_{i}}+\frac{2}{d_{j}}+\frac{2(d_{\max}-1)}{d_{k}(d_{k}-1)}\right\} \\ &\leq 2\left(\frac{1}{2}+\frac{1}{2}+\frac{2(d_{\max}-1)}{2*(2-1)}\right) =d_{\max} \end{aligned}$$

LS (s) C (G) = LS C (G) + s for s ≤ b ij because we may add one edge to complete a half-built triangle involving edge (ij) which makes the sensitivity increased by at most one; meanwhile, \(LS_{C}^{(s)}(G)=LS_{C}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor\) for s > b ij because we have to add two edges to form a triangle to make the sensitivity increased by one, after completing all the b ij half-built triangles involving edge (ij). Besides, LS (s) C (G) ≤ GS C (G) = n − 1. So we have Eq. (17).

Proof of Result 4.

In addition to the Proof of Result 2 given in Nissim et al. (2007), \(LS_{N_{\Updelta}}^{(s)}(G)=3\cdot\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}\) because each of the two entries corresponding to vertex i and j will be decreased by at most \(\max_{i\neq j;\,j\in[n]} {c_{ij}(s)} ,\) when edge (ij) is deleted from G. Besides, there are \(\max_{i\neq j;\,j\in[n]} {c_{ij}(s)}\) other entries whose values will be decreased by one, corresponding to the neighbors in common by vertex i and j.

For N 3, we first consider the situation of s = 0,

$$\begin{aligned} &LS_{N_{3}}(G)= \max_{a_{ij}=1} \left\{ \frac{d_{i}(d_{i}-1)}{2}-\frac{(d_{i}-1)(d_{i}-2)}{2} + \right. \left.\frac{d_{j}(d_{j}-1)}{2}-\frac{(d_{j}-1)(d_{j}-2)}{2} \right\}\\ &\leq \max_{a_{ij}=1} \{d_{i}-1+d_{j}-1 \} \leq d_{\max}+d_{\rm secondmax}-2 \end{aligned}$$

In general case, \(LS_{N_{3}}^{(s)}(G)=LS_{N_{3}}(G)+s\) for s ≤ b ij and \(LS_{N_{3}}^{(s)}(G)=LS_{N_3}(G)+\lfloor\frac{s+b_{ij}}{2}\rfloor\) for s > b ij which are similar to those of LS (s) C (G), and \(LS_{N_{3}}^{(s)}(G)\leq GS_{N_{3}}(G)=2n-4\). So we have the form for \(LS_{N_{3}}^{(s)}(G).\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Wu, X., Zhu, J. et al. On learning cluster coefficient of private networks. Soc. Netw. Anal. Min. 3, 925–938 (2013). https://doi.org/10.1007/s13278-013-0127-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-013-0127-7

Keywords

Navigation