Energy-based anomaly detection for mixed data

Do, Kien; Tran, Truyen; Venkatesh, Svetha

doi:10.1007/s10115-018-1168-z

Energy-based anomaly detection for mixed data

Regular Paper
Published: 12 February 2018

Volume 57, pages 413–435, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

802 Accesses
14 Citations
Explore all metrics

Abstract

Anomalies are those deviating significantly from the norm. Thus, anomaly detection amounts to finding data points located far away from their neighbors, i.e., those lying in low-density regions. Classic anomaly detection methods are largely designed for single data type such as continuous or discrete. However, real-world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Mixed data poses multiple challenges including (a) capturing the inter-type correlation structures and (b) measuring deviation from the norm under multiple types. These challenges are exaggerated under (c) high-dimensional regimes. In this paper, we propose a new scalable unsupervised anomaly detection method for mixed data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that estimates density of mixed data. We propose to use free energy derived from Mv.RBM as anomaly score as it is identical to data negative log-density up to an additive constant. We then extend this method to detect anomalies across multiple levels of data abstraction, an effective approach to deal with high-dimensional settings. The extension is dubbed \(\mathtt {MIXMAD}\), which stands for MIXed data Multilevel Anomaly Detection. In \(\mathtt {MIXMAD}\), we sequentially construct an ensemble of mixed-data Deep Belief Nets (DBNs) with varying depths. Each DBN is an energy-based detector at a predefined abstraction level. Predictions across the ensemble are finally combined via a simple rank aggregation method. The proposed methods are evaluated on a comprehensive suit of synthetic and real high-dimensional datasets. The results demonstrate that for anomaly detection, (a) a proper handling of mixed types is necessary, (b) free energy is a powerful anomaly scoring method, (c) multilevel abstraction of data is important for high-dimensional data, and (d) empirically Mv.RBM and \(\mathtt {MIXMAD}\) are superior to popular unsupervised detection methods for both homogeneous and mixed data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extending an Anomaly Detection Benchmark with Auto-encoders, Isolation Forests, and RBMs

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

Enhancing anomaly detection through restricted Boltzmann machine features projection

Article 13 October 2020

Gustavo H. de Rosa, Mateus Roder, … Kelton A. P. Costa

Notes

A preliminary version of this paper has been published in [16].
The original Mv.RBM also covers rank, but we do not consider in this paper.
http://yann.lecun.com/exdb/mnist/.
https://archive.ics.uci.edu/ml/datasets.html.

References

Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, Springer, pp 420–434
Aggarwal CC, Sathe S (2015) Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor Newsl 17(1):24–47
Article Google Scholar
Akoglu L, Tong H, Vreeken J, Faloutsos C (2012) Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM international conference on information and knowledge management, ACM, pp 415–424
Angiulli, F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery, Springer, pp 15–27
Becker J, Havens TC, Pinar A, Schulz TJ (2015) Deep belief networks for false alarm rejection in forward-looking ground-penetrating radar. In: SPIE defense+ security, International Society for Optics and Photonics, pp 94540W–94540W
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Bontemps L, McDermott J, Le-Khac NA et al (2016) Collective anomaly detection based on long short-term memory recurrent neural networks. In: International conference on future data and security engineering, Springer, pp 141–152
Bouguessa M (2015) A practical outlier detection approach for mixed-attribute data. Expert Syst Appl 42(22):8637–8649
Article Google Scholar
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: ACM sigmod record, vol 29. ACM, pp 93–104
Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2015) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov 30(4):891–927
Article MathSciNet Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Article Google Scholar
Chauhan S, Vig L (2015) Anomaly detection in ECG time signals via deep long short-term memory networks. In: IEEE international conference on data science and advanced analytics (DSAA), 2015. 36678 2015, IEEE, pp 1–7
Cheng M, Xu Q, Lv J, Liu W, Li Q, Wang J (2016) MS-LSTM: a multi-scale LSTM model for BGP anomaly detection. In: IEEE 24th international conference on network protocols (ICNP), 2016, IEEE, pp 1–6
Das K, Schneider J, Neill DB (2008) Anomaly pattern detection in categorical datasets. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 169–176
De Leon AR, Chough KC (2013) Analysis of mixed data: methods & applications. CRC Press, Boca Raton
Book MATH Google Scholar
Do K, Tran T, Phung D, Venkatesh S (2016) Outlier detection on mixed-type data: an energy-based approach. In: International conference on advanced data mining and applications (ADMA 2016)
Fiore U, Palmieri F, Castiglione A, De Santis A (2013) Network anomaly detection with the restricted Boltzmann machine. Neurocomputing 122:13–23
Article Google Scholar
Gao N, Gao L, Gao Q, Wang H (2014) An intrusion detection model based on deep belief networks. In: Second international conference on advanced cloud and big data (CBD), 2014, IEEE, pp 247–252
Ghoting A, Otey ME, Parthasarathy S (2004) Loaded: link-based outlier and anomaly detection in evolving data sets. In: ICDM, pp 387–390
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800
Article MATH Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Ienco D, Pensa RG, Meo R (2016) A semisupervised approach to the detection and characterization of outliers in categorical data. IEEE Trans Neural Netw Learn Syst 28(5):1017–1029
Article Google Scholar
Kamyshanska H, Memisevic R (2015) The potential energy of an autoencoder. IEEE Trans Pattern Anal Mach Intell 37(6):1261–1273
Article Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Koufakou A, Georgiopoulos M (2010) A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min Knowl Discov 20(2):259–289
Article MathSciNet Google Scholar
Koufakou A, Georgiopoulos M, Anagnostopoulos GC (2008) Detecting outliers in high-dimensional datasets with mixed attributes. In: DMIN, Citeseer, pp 427–433
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Lu YC, Feng C, Yating W, Lu CT (2016) Discovering anomalies on mixed-type data using a generalized student-t based approach. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2016.2583429
Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings of ESANN, Presses universitaires de Louvain, pp 89–94
Mehta P, Schwab DJ (2014) An exact mapping between the variational renormalization group and deep learning. arXiv preprint arXiv:1410.3831
Nguyen TD, Tran T, Phung D, Venkatesh S (2013) Latent patient profile modelling and applications with mixed-variaterestricted Boltzmann machine. In: Proceedings of Pacific-Asia conference on knowledge discovery and datamining (PAKDD), Gold Coast, Queensland, Australia
Nguyen TD, Tran T, Phung D, Venkatesh S (2013) Learning sparse latent representation and distance metric for image retrieval. In: Proceedings of IEEE international conference on multimedia & expo, California, USA, July 15–19
Otey ME, Parthasarathy S, Ghoting A (2005) Fast lightweight outlier detection in mixed-attribute data. Techincal report, OSU–CISRC–6/05–TR43
Pai HT, Wu F, Hsueh PYSS (2014) A relative patterns discovery for enhancing outlier detection in categorical data. Dec Support Syst 67:90–99
Article Google Scholar
Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings. 19th international conference on data engineering, 2003. IEEE, pp 315–326
Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reas 50(7):969–978
Article Google Scholar
Serfling R, Wang S (2014) General foundations for studying masking and swamping robustness of outlier identifiers. Statis Methodol 20:79–90
Article MathSciNet Google Scholar
Sun J, Wyss R, Steinecker A, Glocker P (2014) Automated fault detection using deep belief networks for the quality inspection of electromotors. tm-Technisches Messen 81(5):255–263
Article Google Scholar
Tagawa T, Tadokoro Y, Yairi T (2014) Structured denoising autoencoder for fault detection and analysis. In: ACML
Tang G, Pei J, Bailey J, Dong G (2015) Mining multidimensional contextual outliers from categorical relational data. Intell Data Anal 19(5):1171–1192
Article Google Scholar
Taylor A, Leblanc S, Japkowicz N (2016) Anomaly detection in automobile control network data with long short-term memory networks. In: IEEE international conference on data science and advanced analytics (DSAA), 2016, IEEE, pp 130–139
Tran N, Jin H (2012) Detecting network anomalies in mixed-attribute data sets. In: Third international conference on knowledge discovery and data mining, 2010. WKDD’10, IEEE, pp 383–386
Tran T, Phung D, Venkatesh S (2013) Thurstonian Boltzmann machines: learning from multiple inequalities. In: International conference on machine learning (ICML), Atlanta, USA, June 16–21
Tran T, Phung DQ, Venkatesh S (2011) Mixed-variate restricted Boltzmann machines. In: Proceedings of 3rd Asian conference on machine learning (ACML), Taoyuan, Taiwan
Tran T, Luo W, Phung D, Morris J, Rickard K, Venkatesh S (2016) Preterm birth prediction: deriving stable and interpretable rules from high dimensional data. In: Conference on machine learning in healthcare, LA, USA
Tuor A, Kaplan S, Hutchinson B, Nichols N, Robinson S (2017) Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. In: Proceedings of the AAAI-17 Workshop on Artificial Intelligence for Cyber Security, pp 224–231
Wang Y, Cai W, Wei P (2016) A deep learning approach for detecting malicious JavaScript code. Secur Commun Netw 9:1520–1534
Article Google Scholar
Ye M, Li X, Orlowska ME (2009) Projected outlier detection in high-dimensional mixed-attributes data set. Expert Syst Appl 36(3):7104–7113
Article Google Scholar
Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. arXiv preprint arXiv:1605.07717
Zhang K, Jin H (2010) An effective pattern based outlier detection approach for mixed attribute data. In: Australasian joint conference on artificial intelligence, Springer, pp 122–131
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Statis Anal Data Mining 5(5):363–387
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning.

Author information

Authors and Affiliations

Applied AI Institute, Deakin University, 75 Pigdons Rd, Waurn Ponds, VIC, 3216, Australia
Kien Do, Truyen Tran & Svetha Venkatesh

Authors

Kien Do
View author publications
You can also search for this author in PubMed Google Scholar
Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Truyen Tran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Do, K., Tran, T. & Venkatesh, S. Energy-based anomaly detection for mixed data. Knowl Inf Syst 57, 413–435 (2018). https://doi.org/10.1007/s10115-018-1168-z

Download citation

Received: 28 February 2017
Revised: 03 November 2017
Accepted: 30 January 2018
Published: 12 February 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10115-018-1168-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-based anomaly detection for mixed data

Abstract

Access this article

Similar content being viewed by others

Extending an Anomaly Detection Benchmark with Auto-encoders, Isolation Forests, and RBMs

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

Enhancing anomaly detection through restricted Boltzmann machine features projection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Energy-based anomaly detection for mixed data

Abstract

Access this article

Similar content being viewed by others

Extending an Anomaly Detection Benchmark with Auto-encoders, Isolation Forests, and RBMs

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

Enhancing anomaly detection through restricted Boltzmann machine features projection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation