Online NetFPGA decision tree statistical traffic classifier
Introduction
Classifying network traffic accurately at line speed is one of the most challenging issues in network management and their services. The Internet traffic consists of numerous flows from different types of applications. The quality and usability of some applications like voice-over-IP (VoIP), real-time multi-player games, and financial trading platforms depend on meeting specific requirements such as bounded end-to-end delay, jitter, and bandwidth guarantees [1]. Recently, the increase in network traffic generated by emerging applications such as peer-to-peer (P2P), video streaming, and online gaming are causing problems for limited-bandwidth networks such as university networks [2]. From the administrators point-of-view, an accurate on-the-fly network traffic classification is essential for fair bandwidth-sharing as well as early detection of intrusion, malicious attacks, and forbidden applications [3]. Recently, machine learning classifiers such as the ones based on decision tree [4], support vector machine (SVM) [5], and signature matching [6] have been proposed. However, traffic classification at line speed remains a challenge.
Network traffic classification algorithms are generally divided into two groups based on network data layering: stateless and stateful. In stateless classifiers (often called packet classifiers), the required features that distinguish traffic classes from the others are extracted from individual packets. IP address, port address, or even the interval time between two consecutive packets are the examples of stateless features. On the other hand, stateful features are extracted from traffic flows. This means that a stateful flow classifier needs to keep track of all active flows. For a stateful classifier, the statistical properties of the transport layer such as flow duration, flow size, and flow packet inter-arrival time distinguish between different classes of network applications. These statistical parameters are unique for a specific application in the specific time and can be used to distinguish different application classes. Stateful classifiers are more advanced, complicated, slower and accurate compared to stateless classifiers. Therefore, in order to achieve an effective inline1 traffic classification, transport layer features must be computed in time for stateful flow classification.
One of the main problems of stateful classifiers is the difficulty to maintain the flow information [7]. This problem is further aggravated when working with high-bandwidth network, where the statistical features of the larger number of active flows must be tracked. Another consideration is that the stateful classifier must be capable to be reconfigured to suit different traffic classifications. In order to adapt with current high-speed networks, the critical parts such as feature extraction and classification are done as hardware modules.
This paper proposes a NetFPGA based online statistical traffic classifier using the C4.5 decision tree method. The proposed architecture is fully parameterizable to work with different number of features while able to classify multiple full-duplex gigabit line-rates with minimal delay and without packet loss. The proposed classifier is constructed by implementing three main modules in hardware: a Netflow module that is able to provide the statistical information of both uplink and downlink flows between two endpoints, a feature extractor module that is parameterizable for working with different feature lists, and a programmable decision tree classifier. The prototype classifier exhibits an aggregated throughput of 8 Gbps without packet loss.
This paper is organized as follows. Section 2 summarizes some related works on network classification. Section 3 discusses the design requirements for implementing an online network traffic classifier on the NetFPGA platform. Section 4 introduces the hardware implementation of the network classifier. Section 5 presents the case study of using the proposed device for classifying network traffic. Finally, Section 6 concludes the paper with some suggestions for future works.
Section snippets
Related works on network classification
The classical method for classifying traffic is based on identifying well known port addresses. Although this method is simple and fast enough for online traffic classification, it has been proven to be inaccurate to detect current network traffic [8]. Recent popular applications such as online gaming, peer-to-peer, and multimedia streaming use protocol obfuscation or dynamic port hopping to evade detection.
Deep packet inspection (DPI) is a method used in traffic classification. In this method,
Design requirements for inline NetFPGA traffic classifier
This section explains the targeted NetFPGA hardware platform as well as the modifications which are required for designing this classifier to become an inline Netflow DT classifier. As illustrated in Fig. 1, the proposed device is aimed to be placed before an edge router or a campus gateway. Therefore, all the transmitted and received packets will pass through the NetFPGA classifier. In this paper, all flows which are sent out from the campus network are named uplink flows, while the received
Proposed architecture
In this paper, we present a hardware architecture for an inline traffic classifier that uses ML classification of extracted statistical features from the first few transmitted packets between two endpoints on the NetFPGA platform. Fig. 2(a) illustrates our traffic classifier which is made by adding two extra modules to the reference switch design. These modules which are shown in color are a flow classifier module added to the pipeline chain and a time stamp generator unit. The time stamp is a
Case studies
The accuracy of online statistical traffic classifier depends on the dataset used to train the classifier. In order to have an accurate classifier, the training dataset must be generated accurately with sufficient number of samples labeled into different classes. There are several ways to generate the training data-set: DPI, heuristic method, and gt [39]. In this section, we use two different training datasets, one generated based on gt and another based on heuristic, as case studies to
Conclusion and future work
In this paper, we proposed a low-cost inline flow statistical traffic classifier implemented on the NetFPGA platform, where statistical features are extracted from the first few packets of the bidirectional flow between two endpoints. The statistical features can be selected from 35 real-time statistical features. In order to classify online traffic without packet loss, we implemented all three main modules of the statistical classifier: a Netflow module, a feature extractor unit and a decision
References (46)
- et al.
Network utility maximization for triple-play services
Computer Communications
(2008) - et al.
Real-time feature selection in traffic classification
The Journal of China Universities of Posts and Telecommunications
(2008) - R.D. Torres, M.Y. Hajjat, S.G. Rao, M. Mellia, M.M. Munafo, Inferring undesirable behavior from P2P traffic analysis,...
- et al.
Traffic classification on the fly
SIGCOMM Computer Communication Review
(2006) - W. Li, A.W. Moore, A machine learning approach for efficient traffic classification, in: Proceedings of the 15th...
- et al.
An SVM-based machine learning method for accurate internet traffic classification
Information Systems Frontiers
(2010) - B.-C. Park, Y. Won, M.-S. Kim, J. Hong, Towards automated application signature generation for traffic identification,...
- W. Li, K. Abdin, R. Dann, A. Moore, Approaching real-time network traffic classification, Technical Report RR-06-12,...
- A.W. Moore, K. Papagiannaki, Toward the accurate identification of network applications, in: PAM, pp....
- Packeteer, 2012....
BLINC: multilevel traffic classification in the dark
SIGCOMM Computer Communication Review
Internet traffic classification using bayesian analysis techniques
SIGMETRICS Performance Evaluation Review
Automated traffic classification and application identification using machine learning
Realtime encrypted traffic identification using machine learning
Journal of Software
Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification
SIGCOMM Computer Communication Review
A compact 3D VLSI classifier using bagging threshold network ensembles
IEEE Transactions on Neural Networks
Cited by (25)
FastTraffic: A lightweight method for encrypted traffic fast classification
2023, Computer NetworksMATEC: A lightweight neural network for online encrypted traffic classification
2021, Computer NetworksCitation Excerpt :Online traffic classification systems mainly consist of two parts, traffic capture part and traffic classification part [56]. However, the authors in [57] indicated that traffic classifier is the bottleneck of the network traffic classification system because traffic capture has already reached a very high speed due to the development of FPGA [56]. Therefore, we can conclude that the online performance of the classifier is determined by the throughput of the classifier, and we measure the throughput of the models on different devices.
Flow-concurrence and bandwidth ratio on the Internet
2019, Computer CommunicationsCitation Excerpt :Flow-based monitoring has become a vital tool for numerous management tasks that operators and service providers carry out. The examples span a number of fields: monitoring [1–3], performance evaluation of networks [4], traffic engineering [5], the detection of anomalies and denial of use attacks [6,7], traffic classification [8–10] and even the generation of clients’ invoices [11]. Moreover, the research community has also exploited flow-based records as a powerful tool to measure the Internet in an attempt to further expand the knowledge of its dynamics [12–14].
Approaching Hardware Solutions for Massive E-Health Sensor Data Analysis
2017, Smart Sensors Networks: Communication Technologies and Intelligent ApplicationsAn Extensive Survey on Intrusion Detection Systems: Datasets and Challenges for Modern Scenario
2021, ICECIE 2021 - 2021 International Conference on Electrical, Control and Instrumentation Engineering, Conference ProceedingsImpact of Early Estimation of Statistical Flow Features in On-line P2P Classification
2020, 2020 IEEE Student Conference on Research and Development, SCOReD 2020