Elsevier

Knowledge-Based Systems

Volume 89, November 2015, Pages 669-680
Knowledge-Based Systems

Node-coupling clustering approaches for link prediction

https://doi.org/10.1016/j.knosys.2015.09.014Get rights and content

Highlights

  • The novel node coupling clustering methods for link prediction are proposed.

  • A new node coupling degree metric is proposed.

  • The node coupling information and clustering information are used.

  • Experimental evaluation about the effectiveness of our methods is presented.

Abstract

Due to the potential important information in real world networks, link prediction has become an interesting focus of different branches of science. Nevertheless, in “big data” era, link prediction faces significant challenges, such as how to predict the massive data efficiently and accurately. In this paper, we propose two novel node-coupling clustering approaches and their extensions for link prediction, which combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. We then present an experimental evaluation to compare the prediction accuracy and effectiveness between our approaches and the representative existing methods on two synthetic datasets and six real world datasets. The experimental results show our approaches outperform the existing methods.

Introduction

With the rapid development of internet technology, the amount of information in social networks increases significantly. While accessing useful information from social networks has become more and more difficult [1]. Social networks contain large number of potential useful information that is valuable for people’s daily lives and social business [2]. Therefore, social network analysis (SNA) has become a research focus to mine latent useful information from massive social network data. As part of this research, how to accurately predict a potential link in a real network is an important and challenging problem in many domains, such as recommender systems, decision making and criminal investigations. For example, we can predict a potential relationship between two persons to recommend new relationships in the Facebook network. In general, we call the above problem as link prediction [3].

As a subset of link mining [4], link prediction aims to compute the existence probabilities of the missing or future links among vertices in a network [5], [6]. There are two main difficulties in the link prediction problem: (1) huge amount of data, which requires the prediction approaches to have low complexity and (2) prediction accuracy, which requires the prediction approaches to have high prediction accuracy. However, traditional data mining approaches cannot solve the link prediction problem well because they do not consider the relationships between entities, but the links between entities in a social network are interrelated.

To overcome the above two difficulties and meet the practical requirements, many similarity-based methods have been proposed. These methods are mainly based on local analysis and global analysis [7]. The approaches based on local analysis consider only the number or different roles of the common neighbor nodes, which results in lower time complexity. At the same time, they have lower accuracy because of insufficient information. On the other hand, the approaches based on global analysis have higher prediction accuracy and higher time complexity due to accessing the global structure information of a network [5], [8]. So these methods are not satisfying solutions that can overcome the aforementioned two difficulties.

In this paper, we propose two novel node-coupling clustering approaches and their extensions for the link prediction problem. They consider the different roles of nodes, and combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. Our approaches remarkably outperform the existing methods in terms of efficiency accuracy and effectiveness. This is confirmed by experiments in Section 5.

The contributions of this paper consist of the following three aspects: (1) We propose two novel node-coupling clustering approaches and their extensions, which define a novel node-coupling degree metric. (2) We consider the coupling degrees of the common neighbor nodes of a predicted node-pair, by which some links that the existing methods cannot predict are accurately predicted. (3) We use the clustering coefficient to capture the clustering information of a network, which makes our approaches have lower time complexity compared with the existing clustering methods. (4) We use the clustering information that is important information for predicting links, which can improve the prediction accuracy. Experimental evaluation demonstrates our approaches outperform other methods in terms of accuracy and complexity. Our approaches are very suitable for large-scale sparse networks.

The rest of this paper is organized as follows: Section 2 provides the overview of the related works of link prediction. Some preliminaries are briefly introduced in Section 3. Section 4 presents the idea of our approaches, and gives their complexity analysis. Experimental study is presented in Section 5. Section 6 concludes this paper and the future work.

Section snippets

Related work

The existing link prediction approaches can be divided into three categories: the methods based on local analysis and global analysis [7], maximum likelihood estimation methods [5], and machine learning methods [5].

The methods based on local analysis and global analysis exploit the similarity of nodes in a network. The methods based on local analysis consist of Common Neighbors (CN), Adamic Adar (AA), Preferential Attachment (PA) and Jaccard Coefficient (JC). They suppose that the nodes of a

Clustering coefficient

In graph theory, clustering coefficient is a metric that can evaluate the extent to which nodes tend to cluster together in a graph [16]. It can capture the clustering information of nodes in a graph [17]. An undirected network can be described as a graph G=(V,E), where V denotes the set of nodes and E indicates the set of edges. viV is a node in Graph G. The clustering coefficient of node vi in Graph G can be defined asC(i)=Ei(ki·(ki-1))/2=2·Ei(ki·(ki-1))where C(i) denotes the clustering

Node-coupling clustering approaches

In this section, we present our approaches for link prediction. Firstly, we present a new node-coupling degree metric – node-coupling clustering coefficient. Then, we present the process of our approaches. Finally, we give the complexity analysis of our approaches.

Experimental analysis

In this section, we experimentally evaluate the performance of our approaches on two synthetic datasets and six real datasets.

Conclusions and future work

In this paper, we propose node-coupling clustering approaches and their extensions for link prediction. Our approaches not only combine the coupling degrees of the common neighbor nodes with the clustering information of a network but also consider the different roles of nodes for predicting links. Experiments on two synthetic and six real datasets have shown that our approaches have comparatively good prediction results. Specifically, our approaches capture the clustering information of a

Acknowledgments

This work presented in this paper has been partially supported by the National Natural Science Foundation of China (Grant Nos. 61272480, 61332013, 71072172, 71110107026 and 71331005) and the Australian Research Council Discovery Projects (Grant No. DP140100841).

References (26)

  • L. et al.

    Link prediction in complex networks: a survey

    Phys. A: Stat. Mech. Appl.

    (2011)
  • F. Li et al.

    A clustering-based link prediction method in social networks

    Proc. Comput. Sci.

    (2014)
  • K. Musial et al.

    Creation and growth of online social network

    World Wide Web

    (2013)
  • K. Musiał et al.

    Social networks on the internet

    World Wide Web

    (2013)
  • L. Getoor

    Link mining: a new data mining challenge

    ACM SIGKDD Explor. Newslett.

    (2003)
  • L. Getoor et al.

    Link mining: a survey

    ACM SIGKDD Explor. Newslett.

    (2005)
  • B. Taskar, M.-f. Wong, P. Abbeel, D. Koller, Link prediction in relational data, in: Advances in Neural Information...
  • D. Liben-Nowell et al.

    The link prediction problem for social networks

    J. Am. Soc. Inf. Sci. Technol.

    (2007)
  • Z. Liu et al.

    Link prediction in complex networks: A local naïve bayes model

    EPL (Europhys. Lett.)

    (2011)
  • W. Liu et al.

    Link prediction based on local random walk

    EPL (Europhys. Lett.)

    (2010)
  • T. Zhou et al.

    Predicting missing links via local information

    Eur. Phys. J. B – Condens. Matter Complex Syst.

    (2009)
  • A. Clauset et al.

    Hierarchical structure and the prediction of missing links in networks

    Nature

    (2008)
  • R. Guimerà et al.

    Missing and spurious interactions and the reconstruction of complex networks

    Proc. Nat. Acad. Sci.

    (2009)
  • Cited by (29)

    • Link prediction using extended neighborhood based local random walk in multilayer social networks

      2024, Journal of King Saud University - Computer and Information Sciences
    • PILHNB: Popularity, interests, location used hidden Naive Bayesian-based model for link prediction in dynamic social networks

      2021, Neurocomputing
      Citation Excerpt :

      User relationships based methods [62,63] for link prediction uses users attribute similarities. Authors in [64] proposed an algorithm by combining the structural and attribute similarity for link prediction. Authors in [65] use network clustering coefficient and degree of nodes for the link prediction task.

    • Building trust/distrust relationships on signed social service network through privacy-aware link prediction process

      2021, Applied Soft Computing
      Citation Excerpt :

      In other words, the more common friends any two users share in the social network, the more possibility for these two users to build a link. In addition, Li et al. [30] combined various roles of users and the public neighbor similarity index and proposed a new link prediction algorithm that could improve the performance and effects of link prediction. Therefore, the common link prediction method predicts the links between different users based on the current social network structure.

    • CNDP: Link prediction based on common neighbors degree penalization

      2020, Physica A: Statistical Mechanics and its Applications
    • Link prediction in dynamic networks based on the attraction force between nodes

      2019, Knowledge-Based Systems
      Citation Excerpt :

      Similarity-metric-based methods for link prediction are typically designed according to the inherent information and structure, such as local information. The common neighbours index [20], Adamic–Adar index [21], Jaccard index [22] and some other similar indices [23–25] are widely used approaches that are based on local information and they have inspired additional approaches [26,27]. These approaches can be extended to large-scale networks because of their low time complexity.

    View all citing articles on Scopus
    View full text