Node-coupling clustering approaches for link prediction

doi:10.1016/j.knosys.2015.09.014

Knowledge-Based Systems

Volume 89, November 2015, Pages 669-680

https://doi.org/10.1016/j.knosys.2015.09.014 Get rights and content

Highlights

•
The novel node coupling clustering methods for link prediction are proposed.
•
A new node coupling degree metric is proposed.
•
The node coupling information and clustering information are used.
•
Experimental evaluation about the effectiveness of our methods is presented.

Abstract

Due to the potential important information in real world networks, link prediction has become an interesting focus of different branches of science. Nevertheless, in “big data” era, link prediction faces significant challenges, such as how to predict the massive data efficiently and accurately. In this paper, we propose two novel node-coupling clustering approaches and their extensions for link prediction, which combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. We then present an experimental evaluation to compare the prediction accuracy and effectiveness between our approaches and the representative existing methods on two synthetic datasets and six real world datasets. The experimental results show our approaches outperform the existing methods.

Introduction

With the rapid development of internet technology, the amount of information in social networks increases significantly. While accessing useful information from social networks has become more and more difficult [1]. Social networks contain large number of potential useful information that is valuable for people’s daily lives and social business [2]. Therefore, social network analysis (SNA) has become a research focus to mine latent useful information from massive social network data. As part of this research, how to accurately predict a potential link in a real network is an important and challenging problem in many domains, such as recommender systems, decision making and criminal investigations. For example, we can predict a potential relationship between two persons to recommend new relationships in the Facebook network. In general, we call the above problem as link prediction [3].

As a subset of link mining [4], link prediction aims to compute the existence probabilities of the missing or future links among vertices in a network [5], [6]. There are two main difficulties in the link prediction problem: (1) huge amount of data, which requires the prediction approaches to have low complexity and (2) prediction accuracy, which requires the prediction approaches to have high prediction accuracy. However, traditional data mining approaches cannot solve the link prediction problem well because they do not consider the relationships between entities, but the links between entities in a social network are interrelated.

To overcome the above two difficulties and meet the practical requirements, many similarity-based methods have been proposed. These methods are mainly based on local analysis and global analysis [7]. The approaches based on local analysis consider only the number or different roles of the common neighbor nodes, which results in lower time complexity. At the same time, they have lower accuracy because of insufficient information. On the other hand, the approaches based on global analysis have higher prediction accuracy and higher time complexity due to accessing the global structure information of a network [5], [8]. So these methods are not satisfying solutions that can overcome the aforementioned two difficulties.

In this paper, we propose two novel node-coupling clustering approaches and their extensions for the link prediction problem. They consider the different roles of nodes, and combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. Our approaches remarkably outperform the existing methods in terms of efficiency accuracy and effectiveness. This is confirmed by experiments in Section 5.

The contributions of this paper consist of the following three aspects: (1) We propose two novel node-coupling clustering approaches and their extensions, which define a novel node-coupling degree metric. (2) We consider the coupling degrees of the common neighbor nodes of a predicted node-pair, by which some links that the existing methods cannot predict are accurately predicted. (3) We use the clustering coefficient to capture the clustering information of a network, which makes our approaches have lower time complexity compared with the existing clustering methods. (4) We use the clustering information that is important information for predicting links, which can improve the prediction accuracy. Experimental evaluation demonstrates our approaches outperform other methods in terms of accuracy and complexity. Our approaches are very suitable for large-scale sparse networks.

The rest of this paper is organized as follows: Section 2 provides the overview of the related works of link prediction. Some preliminaries are briefly introduced in Section 3. Section 4 presents the idea of our approaches, and gives their complexity analysis. Experimental study is presented in Section 5. Section 6 concludes this paper and the future work.

Section snippets

Related work

The existing link prediction approaches can be divided into three categories: the methods based on local analysis and global analysis [7], maximum likelihood estimation methods [5], and machine learning methods [5].

The methods based on local analysis and global analysis exploit the similarity of nodes in a network. The methods based on local analysis consist of Common Neighbors $(CN)$ , Adamic Adar $(AA)$ , Preferential Attachment $(PA)$ and Jaccard Coefficient $(JC)$ . They suppose that the nodes of a

Clustering coefficient

In graph theory, clustering coefficient is a metric that can evaluate the extent to which nodes tend to cluster together in a graph [16]. It can capture the clustering information of nodes in a graph [17]. An undirected network can be described as a graph $G = (V, E)$ , where V denotes the set of nodes and E indicates the set of edges. $v_{i} \in V$ is a node in Graph G. The clustering coefficient of node $v_{i}$ in Graph G can be defined as $C (i) = \frac{E_{i}}{(k_{i} \cdot (k_{i} - 1)) / 2} = \frac{2 \cdot E_{i}}{(k_{i} \cdot (k_{i} - 1))}$ where $C (i)$ denotes the clustering

Node-coupling clustering approaches

In this section, we present our approaches for link prediction. Firstly, we present a new node-coupling degree metric – node-coupling clustering coefficient. Then, we present the process of our approaches. Finally, we give the complexity analysis of our approaches.

Experimental analysis

In this section, we experimentally evaluate the performance of our approaches on two synthetic datasets and six real datasets.

Conclusions and future work

In this paper, we propose node-coupling clustering approaches and their extensions for link prediction. Our approaches not only combine the coupling degrees of the common neighbor nodes with the clustering information of a network but also consider the different roles of nodes for predicting links. Experiments on two synthetic and six real datasets have shown that our approaches have comparatively good prediction results. Specifically, our approaches capture the clustering information of a

Acknowledgments

This work presented in this paper has been partially supported by the National Natural Science Foundation of China (Grant Nos. 61272480, 61332013, 71072172, 71110107026 and 71331005) and the Australian Research Council Discovery Projects (Grant No. DP140100841).

References (26)

L. Lü et al.
Link prediction in complex networks: a survey
Phys. A: Stat. Mech. Appl.
(2011)
F. Li et al.
A clustering-based link prediction method in social networks
Proc. Comput. Sci.
(2014)
K. Musial et al.
Creation and growth of online social network
World Wide Web
(2013)
K. Musiał et al.
Social networks on the internet
World Wide Web
(2013)
L. Getoor
Link mining: a new data mining challenge
ACM SIGKDD Explor. Newslett.
(2003)
L. Getoor et al.
Link mining: a survey
ACM SIGKDD Explor. Newslett.
(2005)
B. Taskar, M.-f. Wong, P. Abbeel, D. Koller, Link prediction in relational data, in: Advances in Neural Information...
D. Liben-Nowell et al.
The link prediction problem for social networks
J. Am. Soc. Inf. Sci. Technol.
(2007)
Z. Liu et al.
Link prediction in complex networks: A local naïve bayes model
EPL (Europhys. Lett.)
(2011)
W. Liu et al.
Link prediction based on local random walk
EPL (Europhys. Lett.)
(2010)

T. Zhou et al.

Predicting missing links via local information

Eur. Phys. J. B – Condens. Matter Complex Syst.

(2009)

A. Clauset et al.

Hierarchical structure and the prediction of missing links in networks

Nature

(2008)

R. Guimerà et al.

Missing and spurious interactions and the reconstruction of complex networks

Proc. Nat. Acad. Sci.

(2009)

Cited by (29)

Link prediction using extended neighborhood based local random walk in multilayer social networks
2024, Journal of King Saud University - Computer and Information Sciences
One of these challenges in the analysis of social networks is the problem of link prediction. The purpose of this problem is to find links that have not yet been observed, but may exist in the future. There are many solutions for link prediction on monoplex networks. However, many real social networks model communication in multiple layers, which are known as multilayer social networks. A solution for multilayer networks involves taking into account the information of all layers to make predictions for a target layer. Among the existing solutions, local random walk has been confirmed as an efficient technique for link prediction in monoplex networks, but this technique is inefficient for link prediction in multilayer networks due to computational complexity. In order to address this issue, in this paper we propose Extended Neighborhood based Local Random Walk (ENLRW) for link prediction in multilayer networks. ENLRW is an extended version of the classical local random walk technique in which the nearest neighbors are considered based on the extended neighborhood concept. ENLRW calculates the similarity between vertices by integrating several different metrics through reliable paths that include intra-layer and inter-layer information. Besides, ENLRW considers vertex influence as a similarity metric to provide an effective reliable biased random walk. The results of the simulations show that the use of different inter-layer and intra-layer information as well as the local random walk configuration with extended neighborhood provides a trade-off between precision and complexity. Specifically, ENLRW improves the average precision by 3.1% compared to the best available state-of-the-art method.
PILHNB: Popularity, interests, location used hidden Naive Bayesian-based model for link prediction in dynamic social networks
2021, Neurocomputing
Citation Excerpt :
User relationships based methods [62,63] for link prediction uses users attribute similarities. Authors in [64] proposed an algorithm by combining the structural and attribute similarity for link prediction. Authors in [65] use network clustering coefficient and degree of nodes for the link prediction task.
Link prediction aims to predict the missing interactions in evolving networks that may appear in the future. It has practical importance in various real-world applications, ranging from friendship recommendation, knowledge graph completion, target advertising, and protein–protein interaction prediction. Most of the recent efforts focus on the structure of the network while ignoring many other essential factors. In this paper, we present a modified Latent Dirichlet Allocation (LDA), and Hidden Naive Bayesian (HNB) based link prediction technique named PILHNB model for link prediction in dynamic social networks by considering behavioral controlling elements like relationship network structure, nodes’ attributes, location-based information of nodes, nodes’ popularity, users’ interests, and learning the evolution pattern of these factors in the networks. Experimental results on six real-world networks demonstrate our proposed models’ effectiveness and efficiency compared with existing state-of-the-art link prediction techniques.
Building trust/distrust relationships on signed social service network through privacy-aware link prediction process
2021, Applied Soft Computing
Citation Excerpt :
In other words, the more common friends any two users share in the social network, the more possibility for these two users to build a link. In addition, Li et al. [30] combined various roles of users and the public neighbor similarity index and proposed a new link prediction algorithm that could improve the performance and effects of link prediction. Therefore, the common link prediction method predicts the links between different users based on the current social network structure.
With the ever-increasing popularity of social software, we can easily establish a signed social network (SSN) by capturing users’ attitudes (i.e., trust/distrust, friend/enemies, consent/opposition) toward other people. However, the social relationships among users are often very sparse in an SSN, which impede the effective extension of the users’ social circle significantly. To tackle this issue, researchers often use link prediction methods to search for missing links and predict new links in the network. However, existing link prediction methods cannot protect user’s private information well. Considering this shortcoming, we propose a Simhash-based link prediction method with privacy-preservation. Concretely, we first apply Simhash to build less-sensitive user indices and then determine the ”probably similar” friends (i.e., candidates) of a target user based on his or her indices. Through theoretical analysis, it can be known that the method proposed in this paper can effectively protect users’ proprietary information. Second, for each candidate, we calculate his/her trust and distrust values with the target user. Third, we use Social Balance Theory to evaluate the possibility of building a link between the candidate and the target user based on the trust and distrust values. Finally, we conducted a set of experiments on the real-world Epinions dataset. Experimental results prove the advantages of our proposal in terms of overcoming the sparsity problem, compared to other competitive approaches.
Mining user interest based on personality-aware hybrid filtering in social networks
2020, Knowledge-Based Systems
With the emergence of online social networks and microblogging websites, user interest mining has been an active research topic for the past few years. However, most of the existing works suffer from two significant drawbacks, firstly, they focus on the user’s explicit content and social network structure to predicate the user’s interests, neglecting the fact that the user’s personality might be a rich source to infer the topical interests. Secondly, they represent the user’s content using the bag-of-words model that ignores the chronological order of the posted content, hence the predicted interests might contain outdated topics that the user does not interest anymore. In this paper, we propose a novel user interest mining system based on Big Five personality traits and dynamic interests. To prove the effectiveness of incorporating the user’s personality traits in the interest mining process, we have implemented a social network for news sharing and conducted different experiments on the collected data. The experiment results show that considering personality traits can increase the precision and recall of interest mining systems, as well as can help to tackle the cold start problem.
CNDP: Link prediction based on common neighbors degree penalization
2020, Physica A: Statistical Mechanics and its Applications
In social network analysis, link prediction is a fundamental tool to determine new relationships among users which are most likely to occur in the future. Link prediction by means of a similarity metric is common in which a pair of similar nodes is likely to be connected. In this paper, we propose a similarity-based link prediction algorithm, referred to as CNDP, which similarity score is determined according to the structure and specific characteristics of the network, as well as the topological characteristics. In the proposed method, a new metric for link prediction is introduced, considering clustering coefficient as a structural property of the network. Moreover, the presented method considers the neighbors of shared neighbors in addition to only shared neighbors of each pair of nodes, which leads to achieve better performance than other similar link prediction methods. The empirical results of evaluation on synthetic and real-world networks demonstrate that the proposed algorithm achieves higher accuracy prediction results with lower complexity, and performs superior compared to other algorithms.
Link prediction in dynamic networks based on the attraction force between nodes
2019, Knowledge-Based Systems
Citation Excerpt :
Similarity-metric-based methods for link prediction are typically designed according to the inherent information and structure, such as local information. The common neighbours index [20], Adamic–Adar index [21], Jaccard index [22] and some other similar indices [23–25] are widely used approaches that are based on local information and they have inspired additional approaches [26,27]. These approaches can be extended to large-scale networks because of their low time complexity.
As an important technology of social network analysis, link prediction is widely applied in computer science and many other fields. Link prediction can be used to detect missing links or predict whether two unconnected nodes will connect in the future. Various link prediction approaches have been proposed based on similarity metrics or learning in recent years; however, most failed to consider the direct changes during network development, and hence they are not applied to dynamic networks whose structures change continuously over time. In this paper, a novel approach for link prediction in dynamic networks based on the attraction force between nodes (DLPA) is proposed for detecting missing links and for predicting whether potential links will become real links in the future. First, a level is assigned to each node, which is used to represent the influence strength of the node compared to its neighbours in the initial network snapshot. The level must be updated with changes in the nodes. Then, the connection probability of each potential link is calculated based on the levels of the corresponding nodes and the attraction force between them. Thus, missing links can be detected and potential links can be predicted. In addition, the connection probabilities of potential links calculated via the proposed approach can vary with the evolution of the network. Experiments on static and dynamic real-world networks are conducted to evaluate the performance of the proposed approach, and the results demonstrate that the proposed approach outperforms several baseline algorithms in terms of prediction accuracy.

View all citing articles on Scopus

View full text

Node-coupling clustering approaches for link prediction

Highlights

Abstract

Introduction

Section snippets

Related work

Clustering coefficient

Node-coupling clustering approaches

Experimental analysis

Conclusions and future work

Acknowledgments

Phys. A: Stat. Mech. Appl.

Proc. Comput. Sci.

Creation and growth of online social network

World Wide Web

Social networks on the internet

World Wide Web

Link mining: a new data mining challenge

ACM SIGKDD Explor. Newslett.

Link mining: a survey

ACM SIGKDD Explor. Newslett.

The link prediction problem for social networks

J. Am. Soc. Inf. Sci. Technol.

Link prediction in complex networks: A local naïve bayes model

EPL (Europhys. Lett.)

Link prediction based on local random walk