Elsevier

Computer Communications

Volume 36, Issue 12, 1 July 2013, Pages 1329-1340
Computer Communications

Online NetFPGA decision tree statistical traffic classifier

https://doi.org/10.1016/j.comcom.2013.05.004Get rights and content

Abstract

Classifying online network traffic is becoming critical in network management and security. Recently, new classification methods based on analysis of statistical features of transport layer traffic have been proposed. While these new methods address the limitations of the port based and payload based traffic classification, the current software-based solutions are not fast enough to deal with the traffic of today’s high-speed networks. In this paper, we propose an online statistical traffic classifier using the C4.5 machine learning algorithm running on the NetFPGA platform. Our NetFPGA classifier is constructed by adding three main modules to the NetFPGA reference switch design; a Netflow module, a feature extractor module, and a C4.5 search tree classifier. The proposed classifier is able to classify the input traffics at the maximum line speed of the NetFPGA platform, i.e. 8 Gbps without any packet loss. Our method is based on the statistical features of the first few packets of a flow. The flow is classified just a few micro seconds after receiving the desired number of packets.

Introduction

Classifying network traffic accurately at line speed is one of the most challenging issues in network management and their services. The Internet traffic consists of numerous flows from different types of applications. The quality and usability of some applications like voice-over-IP (VoIP), real-time multi-player games, and financial trading platforms depend on meeting specific requirements such as bounded end-to-end delay, jitter, and bandwidth guarantees [1]. Recently, the increase in network traffic generated by emerging applications such as peer-to-peer (P2P), video streaming, and online gaming are causing problems for limited-bandwidth networks such as university networks [2]. From the administrators point-of-view, an accurate on-the-fly network traffic classification is essential for fair bandwidth-sharing as well as early detection of intrusion, malicious attacks, and forbidden applications [3]. Recently, machine learning classifiers such as the ones based on decision tree [4], support vector machine (SVM) [5], and signature matching [6] have been proposed. However, traffic classification at line speed remains a challenge.

Network traffic classification algorithms are generally divided into two groups based on network data layering: stateless and stateful. In stateless classifiers (often called packet classifiers), the required features that distinguish traffic classes from the others are extracted from individual packets. IP address, port address, or even the interval time between two consecutive packets are the examples of stateless features. On the other hand, stateful features are extracted from traffic flows. This means that a stateful flow classifier needs to keep track of all active flows. For a stateful classifier, the statistical properties of the transport layer such as flow duration, flow size, and flow packet inter-arrival time distinguish between different classes of network applications. These statistical parameters are unique for a specific application in the specific time and can be used to distinguish different application classes. Stateful classifiers are more advanced, complicated, slower and accurate compared to stateless classifiers. Therefore, in order to achieve an effective inline1 traffic classification, transport layer features must be computed in time for stateful flow classification.

One of the main problems of stateful classifiers is the difficulty to maintain the flow information [7]. This problem is further aggravated when working with high-bandwidth network, where the statistical features of the larger number of active flows must be tracked. Another consideration is that the stateful classifier must be capable to be reconfigured to suit different traffic classifications. In order to adapt with current high-speed networks, the critical parts such as feature extraction and classification are done as hardware modules.

This paper proposes a NetFPGA based online statistical traffic classifier using the C4.5 decision tree method. The proposed architecture is fully parameterizable to work with different number of features while able to classify multiple full-duplex gigabit line-rates with minimal delay and without packet loss. The proposed classifier is constructed by implementing three main modules in hardware: a Netflow module that is able to provide the statistical information of both uplink and downlink flows between two endpoints, a feature extractor module that is parameterizable for working with different feature lists, and a programmable decision tree classifier. The prototype classifier exhibits an aggregated throughput of 8 Gbps without packet loss.

This paper is organized as follows. Section 2 summarizes some related works on network classification. Section 3 discusses the design requirements for implementing an online network traffic classifier on the NetFPGA platform. Section 4 introduces the hardware implementation of the network classifier. Section 5 presents the case study of using the proposed device for classifying network traffic. Finally, Section 6 concludes the paper with some suggestions for future works.

Section snippets

Related works on network classification

The classical method for classifying traffic is based on identifying well known port addresses. Although this method is simple and fast enough for online traffic classification, it has been proven to be inaccurate to detect current network traffic [8]. Recent popular applications such as online gaming, peer-to-peer, and multimedia streaming use protocol obfuscation or dynamic port hopping to evade detection.

Deep packet inspection (DPI) is a method used in traffic classification. In this method,

Design requirements for inline NetFPGA traffic classifier

This section explains the targeted NetFPGA hardware platform as well as the modifications which are required for designing this classifier to become an inline Netflow DT classifier. As illustrated in Fig. 1, the proposed device is aimed to be placed before an edge router or a campus gateway. Therefore, all the transmitted and received packets will pass through the NetFPGA classifier. In this paper, all flows which are sent out from the campus network are named uplink flows, while the received

Proposed architecture

In this paper, we present a hardware architecture for an inline traffic classifier that uses ML classification of extracted statistical features from the first few transmitted packets between two endpoints on the NetFPGA platform. Fig. 2(a) illustrates our traffic classifier which is made by adding two extra modules to the reference switch design. These modules which are shown in color are a flow classifier module added to the pipeline chain and a time stamp generator unit. The time stamp is a

Case studies

The accuracy of online statistical traffic classifier depends on the dataset used to train the classifier. In order to have an accurate classifier, the training dataset must be generated accurately with sufficient number of samples labeled into different classes. There are several ways to generate the training data-set: DPI, heuristic method, and gt [39]. In this section, we use two different training datasets, one generated based on gt and another based on heuristic, as case studies to

Conclusion and future work

In this paper, we proposed a low-cost inline flow statistical traffic classifier implemented on the NetFPGA platform, where statistical features are extracted from the first few packets of the bidirectional flow between two endpoints. The statistical features can be selected from 35 real-time statistical features. In order to classify online traffic without packet loss, we implemented all three main modules of the statistical classifier: a Netflow module, a feature extractor unit and a decision

References (46)

  • L. Shi et al.

    Network utility maximization for triple-play services

    Computer Communications

    (2008)
  • J.-J. Zhao et al.

    Real-time feature selection in traffic classification

    The Journal of China Universities of Posts and Telecommunications

    (2008)
  • R.D. Torres, M.Y. Hajjat, S.G. Rao, M. Mellia, M.M. Munafo, Inferring undesirable behavior from P2P traffic analysis,...
  • L. Bernaille et al.

    Traffic classification on the fly

    SIGCOMM Computer Communication Review

    (2006)
  • W. Li, A.W. Moore, A machine learning approach for efficient traffic classification, in: Proceedings of the 15th...
  • R. Yuan et al.

    An SVM-based machine learning method for accurate internet traffic classification

    Information Systems Frontiers

    (2010)
  • B.-C. Park, Y. Won, M.-S. Kim, J. Hong, Towards automated application signature generation for traffic identification,...
  • W. Li, K. Abdin, R. Dann, A. Moore, Approaching real-time network traffic classification, Technical Report RR-06-12,...
  • A.W. Moore, K. Papagiannaki, Toward the accurate identification of network applications, in: PAM, pp....
  • Packeteer, 2012....
  • R. Sommer, A. Feldman, An IDS using NetFlow data retrieved March 2008,...
  • T. Karagiannis et al.

    BLINC: multilevel traffic classification in the dark

    SIGCOMM Computer Communication Review

    (2005)
  • A.W. Moore et al.

    Internet traffic classification using bayesian analysis techniques

    SIGMETRICS Performance Evaluation Review

    (2005)
  • S. Zander et al.

    Automated traffic classification and application identification using machine learning

  • J. Erman, A. Mahanti, M. Arlitt, Internet traffic identification using machine learning techniques, in: Proceedings of...
  • Y. Wang, S.Z. Yu, Machine learned real-time traffic classifiers, in: Second International Symposium on Intelligent...
  • R. Bar Yanai, M. Langberg, D. Peleg, L. Roditty, Realtime classification for encrypted traffic, in: Proceedings of the...
  • C. Gu et al.

    Realtime encrypted traffic identification using machine learning

    Journal of Software

    (2011)
  • M. Roughan et al.

    Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification

  • A. Moore, M. Crogan, A.W. Moore, Q. Mary, D. Zuev, M.L. Crogan, Discriminators for use in flow-based classification,...
  • N. Williams et al.

    A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

    SIGCOMM Computer Communication Review

    (2006)
  • A. Bermak et al.

    A compact 3D VLSI classifier using bagging threshold network ensembles

    IEEE Transactions on Neural Networks

    (2003)
  • S. Lopez-Estrada, R. Cumplido, Decision tree based FPGA-architecture for texture sea state classification, in: IEEE...
  • Cited by (25)

    • MATEC: A lightweight neural network for online encrypted traffic classification

      2021, Computer Networks
      Citation Excerpt :

      Online traffic classification systems mainly consist of two parts, traffic capture part and traffic classification part [56]. However, the authors in [57] indicated that traffic classifier is the bottleneck of the network traffic classification system because traffic capture has already reached a very high speed due to the development of FPGA [56]. Therefore, we can conclude that the online performance of the classifier is determined by the throughput of the classifier, and we measure the throughput of the models on different devices.

    • Flow-concurrence and bandwidth ratio on the Internet

      2019, Computer Communications
      Citation Excerpt :

      Flow-based monitoring has become a vital tool for numerous management tasks that operators and service providers carry out. The examples span a number of fields: monitoring [1–3], performance evaluation of networks [4], traffic engineering [5], the detection of anomalies and denial of use attacks [6,7], traffic classification [8–10] and even the generation of clients’ invoices [11]. Moreover, the research community has also exploited flow-based records as a powerful tool to measure the Internet in an attempt to further expand the knowledge of its dynamics [12–14].

    • Approaching Hardware Solutions for Massive E-Health Sensor Data Analysis

      2017, Smart Sensors Networks: Communication Technologies and Intelligent Applications
    • An Extensive Survey on Intrusion Detection Systems: Datasets and Challenges for Modern Scenario

      2021, ICECIE 2021 - 2021 International Conference on Electrical, Control and Instrumentation Engineering, Conference Proceedings
    • Impact of Early Estimation of Statistical Flow Features in On-line P2P Classification

      2020, 2020 IEEE Student Conference on Research and Development, SCOReD 2020
    View all citing articles on Scopus
    View full text