Ellipsoidal neighbourhood outlier factor for distributed anomaly detection in resource constrained networks
Introduction
Resource constrained networks, such as wireless sensor networks (WSNs), composed of compact, cheap, intelligent sensor nodes, provide the ability to collect measurements from sensors that are distributed across a large physical area with a lower cost of building and operating the network compared to a traditional wired network. These benefits come with a trade-off in terms of limited resources, specifically, the battery life of sensor nodes. By detecting interesting or unusual events in sensor networks, we can avoid transmitting uninformative or erroneous measurements. This can reduce the energy consumed in the network, as well as improving the reliability of the data collected. Detecting interesting or unusual events in the network is known as the anomaly detection problem.
An anomaly or outlier in a data set is defined as “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data” [1]. In order to identify anomalies in a data set, we need to find a model for the normal data and then the anomalies can be identified as those data vectors that deviate significantly from the normal model. Anomalies in sensor networks can occur due to faulty sensors, gradual drift in the sensor elements, loss of calibration, noisy sensor transducers, movement of sensors, changes in observed phenomena or malicious attacks such as false data attacks. A key challenge in WSNs is to identify any misbehavior or interesting events (anomalies) with minimal communication overhead within the network while achieving high detection accuracy.
In this paper, we propose a novel anomaly scoring mechanism and a distributed anomaly detection architecture for resource constrained networks. Our main contributions in this paper are as follows.
- 1.
We propose a novel anomaly scoring scheme called the hyperEllipsoidal Neighborhood Outlier Factor (ENOF). This assigns an outlier factor (score) to each hyperellipsoidal model of the sensor data at each sensor node, and identifies the globally anomalous set of hyperellipsoids.
- 2.
Using the ENOF, we propose a distributed anomaly detection architecture that enables identification of any anomalies in the network, which has the following characteristics:
- (a)
It models the sensor data collected at each node level using multiple hyperellipsoids, which enables the modelling of multi-modal distributions in the sensor data. We utilise our previously proposed HyCARCE algorithm [2] (see Section 4.1) and our ellipsoidal similarity measure [3] (see Section 4.2.1) to automatically infer the set of hyperellipsoidal clusters at each node level.
- (b)
Only summary information about the hyperellipsoids needs to be communicated among the nodes (along the sensor network routing hierarchy) to infer a set of global hyperellipsoids that correspond to a global model of normal behaviour. In comparison to communicating all the raw measurements to a central gateway for analysis, our approach vastly reduces the communication complexity in the network and helps prolong the lifetime of the network.
- (c)
It identifies globally anomalous data vectors at their node level based on the ENOF score, in addition to the locally anomalous data vectors, using the globally anomalous hyperellipsoids that are communicated to the nodes by their parent nodes. The proposed architecture is complete in this sense of being able to detect all local and global anomalies down to the level of individual nodes.
- (a)
- 3.
We evaluate our scheme on several real and synthetic data sets and demonstrate that our scheme achieves comparable detection accuracy to centralised and other existing schemes while consuming significantly less communication overhead in the network.
Our proposed outlier scoring methodology and the anomaly detection architecture have many potential applications. These include monitoring the health of coral reefs such as the Great Barrier Reef, Australia [4], monitoring and identifying inefficient energy consumption behaviour in an office or residential (indoor) environment [5], and Internet of Things (IoT) [6] applications such as Smart City monitoring for noise, pollution and environmental parameters [7], [8], [9], [10], [11].
The rest of the paper is organised as follows. Section 2 describes different kinds of anomalies in WSNs, followed by a review of existing anomaly detection methods in Section 3. Our proposed architecture and the ENOF are presented in Section 4. Complexity analysis is given in Section 5. Evaluation and comparison of the algorithms using several real and synthetic datasets is presented in Section 6, followed by the conclusion and future work in Section 7.
Section snippets
Network topology and local and global anomalies
The network topology of WSNs and the composition of sensor node types used in the network are heavily dependent on their application. One flexible arrangement of sensor nodes is a clustered or multi-tiered hierarchical topology in which a parent–child relationship exists, such as the one shown in Fig. 2(a). We use this hierarchical topology to describe the different kinds of anomalies found in a sensor network and to propose a distributed detection approach. However, our scheme is applicable to
Related work
The task of detecting interesting or unusual events in a general manner is an open problem in the data mining community, and is often referred to as the anomaly detection problem. Several alternative definitions of anomalies have been proposed in the literature, as well as a variety of detection algorithms [13], [14], [15], [16]. Anomaly or outlier detection techniques that do not assume any prior knowledge about the distribution of the data are called non-parametric anomaly detection
Distributed architecture for automatic anomaly detection
We consider a set of sensor nodes having a hierarchical topology as shown in Fig. 2. The sensors are deployed in an environment in which the measurements taken have an unknown distribution. At every time interval each sensor node sj measures a data vector (we sometimes refer to as a measurement). Each data vector is composed of p attributes where and . After a window of n measurements, each sensor sj has collected a set of measurements
Complexity analysis
Our distributed approach comprises several components, namely HyCARCE clustering, the ENOF calculation and anomaly detection. We analyse the complexities of each of these components as follows.
Computational complexity: The HyCARCE clustering algorithm incurs a complexity of O(n) [2], where n is the number of data vectors. The computation of the focal similarity measure, for each ellipsoid, requires an eigendecomposition of the covariance matrix for the computation of the focal points and the
Evaluation
We perform two sets of evaluations. First, we evaluate the ENOF scoring mechanism using real WSN data sets, namely the HI, GSB and REDUCE data sets. Second, we compare the performance of the distributed approach with the centralised approach and other existing schemes in the literature. We use two synthetic data sets and two real WSN deployment data sets for this purpose, namely the Banana, Gaussmix, IBRL and GDI data sets.
ENOF evaluation: We considered three real WSN deployment data sets for
Conclusion
In this paper, an anomaly scoring mechanism for hyperellipsoidal clusters is proposed along with a distributed anomaly detection algorithm for resource constrained networks. The scheme is capable of identifying local and global anomalies at an individual node level. The evaluation on several real and synthetic data sets reveals that the distributed approach achieves comparable detection performance compared to a centralised approach, while achieving a significant reduction in communication
Conflict of interest statement
None declared.
Acknowledgments
We thank the support from REDUCE project Grant (EP/I000232/1) under the Digital Economy Programme run by Research Councils UK – a cross council initiative led by EPSRC and contributed to by AHRC, ESRC and MRC; the Australian Research Council (ARC) Research Network on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) and the ARC Grants (LP120100529 and LE120100129).
Sutharshan Rajasegarar received his B.Sc. Engineering degree in Electronic and Telecommunication Engineering in 2002, from the University of Moratuwa, Sri Lanka, and the Ph.D. degree in 2009 from the University of Melbourne, Australia. He is currently a Research Fellow with the Department of Electrical and Electronic Engineering, the University of Melbourne, Australia. His research interests include distributed anomaly/outlier detection, Internet of Things (IoT), wireless sensor networks,
References (36)
- et al.
An efficient hyperellipsoidal clustering algorithm for resource-constrained environments
Pattern Recognit.
(2011) - et al.
Clustering ellipses for anomaly detection
Pattern Recognit.
(2011) - et al.
Internet of things (IoT)a vision
Archit. Elem., Future Dir., Future Gener. Comput. Syst.
(2013) - et al.
Clustering distributed data streams in peer-to-peer environments
Inf. Sci.
(2006) - et al.
Hyperspherical cluster based distributed anomaly detection in wireless sensor networks
J. Parallel Distrib. Comput.
(2014) - et al.
Outliers in Statistical Data
(1994) - et al.
Anomaly detection in environmental monitoring networks
IEEE Comput. Intell. Mag.
(2011) - L. Rashidi, S. Rajasegarar, C. Leckie, M. Nati, A. Gluhak, M.A. Imran, et al., Profiling spatial and temporal behaviour...
- Internet of Things (IoT), 2013,...
- et al.
High-Resolution Monitoring of Atmospheric Pollutants Using a System of Low-Cost Sensors
in IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS)
(2014)
Elliptical anomalies in wireless sensor networks
ACM Trans. Sens. Netw.
Anomaly detectiona survey
ACM Comput. Surv.
Anomaly detection in wireless sensor networks
IEEE Wirel. Commun.
Cited by (35)
Intrusion detection systems in the Internet of things: A comprehensive investigation
2019, Computer NetworksCitation Excerpt :The important disadvantages of the method are that it has a low combination with the IoT and it is not real-time intrusion detection. Rajasegarar et al. [79] have proposed a distributed anomaly detection architecture which utilizes numerous hyper ellipsoidal groups to show the information at every node and detect the global and neighborhood abnormal behavior in the system. Specifically, a score is given to each hyper ellipsoidal model in the anomaly scoring technique by calculating the distance of the ellipsoid with their neighbors.
Modification of supervised OPF-based intrusion detection systems using unsupervised learning and social network concept
2017, Pattern RecognitionCitation Excerpt :Constant FAR outlier detection, using a supervised method based on normalized residual values, was proposed by Ru et al. [11]. Rajasegarar et al. [12] proposed distributed anomaly detection architecture for modeling the data at each sensor in the network. This architecture used multiple hyperellipsoidal clusters and detected global and local anomalies.
Internet of Things: An overview
2016, Internet of Things: Principles and ParadigmsA New Outlier Detection Method for Anomaly Detection in IoT-Enabled Distribution Networks
2023, Ad-Hoc and Sensor Wireless NetworksA Comprehensive System for Smart Homes with a Minimalist Information Security Framework
2023, Lecture Notes in Networks and Systems
Sutharshan Rajasegarar received his B.Sc. Engineering degree in Electronic and Telecommunication Engineering in 2002, from the University of Moratuwa, Sri Lanka, and the Ph.D. degree in 2009 from the University of Melbourne, Australia. He is currently a Research Fellow with the Department of Electrical and Electronic Engineering, the University of Melbourne, Australia. His research interests include distributed anomaly/outlier detection, Internet of Things (IoT), wireless sensor networks, machine learning, pattern recognition, signal processing and wireless communication.
Alexander Gluhak is a senior research fellow at CCSR where he is coordinating experimental IoT related research activities. He completed a Dipl.-Ing.(FH) degree from the University of Applied Sciences in Offenburg, Germany, in 2002 and a Ph.D. degree at the University of Surrey in 2006. He has been/is responsible for the lead of technical work in several large European research projects on the IoT, such as e-SENSE, SENSEI and SmartSantander. His current research focuses on exploiting machine learning and pervasive computing/IoT technologies for an increased machine understanding of human behaviour.
Muhammad Ali Imran obtained his B.Sc. in Electrical Engineering (Electronics and Communications) from University of Engineering & Technology Lahore 1999. He obtained his M.Sc. degree and Ph.D. from Imperial College London in Communications and Signal Processing in 2002. He was awarded the Diploma of Imperial College and the PhD degree from Imperial College and University of London in 2007. In 2009, he joined the faculty of engineering and physical sciences (FEPS) at the University of Surrey as a Lecturer in Communications, and coordinated the University of Surrey team involved in FP7 EC funded project EARTH. Currently, he is leading an EPSRC funded project – REDUCE that aims to reshape the energy demand of end users using ICT and economic incentives. He is a work package leader for IU-ATC phase II funded by EPSRC. He is co-supervising the Huawei funded Green Communications project and Sony funded Machine to Machine communication projects. He is actively participating in EU funded projects LexNet and iJoin.
Michele Nati is currently a research fellow at the Centre for Communication System Research (CCSR), University of Surrey. He received an M.Sc. degree in Computer Engineering (2003) and a Ph.D. in Computer Science (2008), both from the University of Rome ‘La Sapienza’. During his Ph.D. course, he spent 1 year as a visiting researcher at the Electronics and Computer Engineering Department of Northeastern University, Boston. Before moving to Surrey, he was a post-doc researcher at Consorzio Ferrara Ricerche and research engineer at NEXSE/WLAB, a company based in Rome. His research interests concern the design, analysis, evaluation, and implementation of efficient cross-layer protocols for wireless sensor networks, with special emphasis on real deployments.
Masud Moshtaghi received the B.Sc. degree in 2006 in computer science, and the M.Sc. in software engineering in 2008 from The University of Tehran. He joined The University of Melbourne in 2009. His research interests include artificial intelligence for network security, pattern recognition and data mining.
Christopher Leckie is a professor at the Department of Computing and Information Systems, the University of Melbourne, Australia. He received the B.Sc. degree in 1985, the B.E. degree in electrical and computer systems engineering in 1987, and the Ph.D. degree in computer Science in 1992, all from Monash University, Australia. He is currently the deputy director of NICTA Victoria Research Laboratory. His research interests include scalable data mining, network intrusion detection, wireless sensor networks, artificial intelligence (AI), telecommunications, machine learning, fault diagnosis, distributed systems and design automation.
Marimuthu Palaniswami is a professor at the Department of Electrical and Electronic Engineering, the University of Melbourne, Australia. He received his B.E. (Hons) from the University of Madras, India in 1977, ME from the Indian Institute of science, India in 1979, M.Eng.Sc. from the University of Melbourne in 1983 and the Ph.D. from the University of Newcastle, Australia in 1987. He currently leads the ARC Research Network on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) programme. His research interests include Internet of Things (IoT), SVMs, Sensors and Sensor Networks, Machine Learning, Neural Network, Pattern Recognition, Signal Processing and Control.