Skip to main content
Log in

Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Peer-to-peer (P2P) classifications based on flow statistics have been proven accurate in detecting P2P traffic. A machine learning classification is affected by the quality and recency of the training dataset used. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this paper, an automated training dataset generation for an on-line P2P traffic classification is proposed to allow frequent classifier retraining. A two-stage training dataset generator (TSTDG) is proposed by combining a 3-class heuristic and a 3-class statistical classification to automatically generate a training dataset. In the heuristic stage, traffic is classified as P2P, non-P2P, or unknown. In the statistical stage, a dual Decision Tree is built based on a dataset generated in the heuristic stage to reduce the amount of classified unknown traffic. The final training dataset is generated based on all flows that are classified in these two stages. The proposed system has been evaluated on traces captured from a campus network. The overall results show that the TSTDG can generate an accurate training dataset by classifying around 94 % of total flows with high accuracy (98.59 %) and a low false positive rate (1.27 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Flows are distinguished based on [Source IP, Destination IP, Src Port, Dst Post, Protocol].

  2. J48 is an open source C++ implementation of the C4.5 algorithm

References

  1. Chen, Z., Yang, B., Chen, Y., Abraham, A., Grosan, C., Peng, L.: Online hybrid traffic classifier for peer-to-peer systems based on network processors. Appl. Soft. Comput. 9(2), 685–694 (2009)

    Article  Google Scholar 

  2. Soysal, M., Schmidt, E.G.: Machine learning algorithms for accurate flow-based network traffic classification: evaluation and comparison. Perform. Eval. 67(6), 451–467 (2010)

    Article  Google Scholar 

  3. Bernaille, L., Teixeira, R., Salamatian, K.: Early application identification. In: Proceedings of the 2006 ACM CoNEXT Conference (CoNEXT ’06), pp. 6:1–6:12. Lisboa, Portugal (2006)

  4. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)

    Article  Google Scholar 

  5. Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)

    Article  Google Scholar 

  6. Xu, K., Zhang, M., Ye, M., Chiu, D.M., Wu, J.: Identify P2P traffic by inspecting data transfer behavior. Comput. Commun. 33(10), 1141–1150 (2010)

    Article  Google Scholar 

  7. Lu, W., Tavallaee, M., Ghorbani, A.A.: Hybrid traffic classification approach based on Decision Tree. In: Proceedings of the 28th IEEE Conference on Global Telecommunications (GLOBECOM’09), pp. 5679–5684. Honolulu, Hawaii, USA (2009)

  8. Keralapura, R., Nucci, A., Chuah, C.N.: A novel self-learning architecture for p2p traffic classification in high speed networks. Comput. Netw. 54(7), 1055–1068 (2010)

    Article  MATH  Google Scholar 

  9. Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64(9–12), 1194–1213 (2007)

    Article  Google Scholar 

  10. Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of 15th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 310–317. Washington, DC, USA (2007)

  11. Tian, X., Sun, Q., Huang, X., Ma, Y.: A dynamic online traffic classification methodology based on data stream mining. In: Proceedings of the 2009 WRI world congress on computer science and information engineering—Volume 01, CSIE ’09, pp. 298–302. IEEE Computer Society, Washington, DC, USA (2009)

  12. Mula-Valls, O.: A practical retraining mechanism for network traffic classification in operational environments. Master thesis, Universitat Politècnica de Catalunya (2011)

  13. Mingliang, G., Xiaohong, H., Xu, T., Yan, M., Zhenhua, W.: Data stream mining based real-time highspeed traffic classification. In: Proceedings of the 2nd IEEE international conference on broadband network multimedia technology (IC-BNMT’09), pp. 700–705. Beijing, China (2009)

  14. Raahemi, B., Zhong, W., Liu, J.: Peer-to-peer traffic identification by mining IP layer data streams using concept-adapting very fast Decision Tree. In: Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), vol. 1, pp. 525–532. Dayton, OH, USA (2008)

  15. Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)

    Article  Google Scholar 

  16. Hassan, M., Marsono, M.: A three-class heuristics technique: generating training corpus for peer-to-peer traffic classification. In: Proceedings of the 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application (IMSAA 2010), pp. 1–5. Bangalore, India (2010)

  17. Sears, W., Yu, Z., Guan, Y.: An adaptive reputation-based trust framework for peer-to-peer applications. In: Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications (NCA’05), pp. 13–20. Cambridge, MA, USA (2005)

  18. Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th international conference on World Wide Web (WWW ’04), pp. 512–521. New York, NY, USA (2004)

  19. Karagiannis, T., Broido, A., Faloutsos, M., claffy, k.c.: Transport layer identification of P2P traffic. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pp. 121–134. Taormina, Sicily, Italy (2004)

  20. Perényi, M., Dang, T.D., Gefferth, A., Molnr, S.: Identification and analysis of peer-to-peer traffic. J. Commun. 1(7), 36–46 (2006)

    Google Scholar 

  21. Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37(1), 5–16 (2007)

    Article  Google Scholar 

  22. Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. IEEE/ACM Trans. Netw. 12, 219–232 (2004)

    Article  Google Scholar 

  23. Raahemi, B., Hayajneh, A., Rabinovitch, P.: Peer-to-peer IP traffic classification using Decision Tree and IP layer attributes. Int. J. Bus. Data Commun. Netw. 3(4), 60–74 (2007)

    Article  Google Scholar 

  24. Kim, H., Fomenkov, M., Claffy, K.C., Brownlee, N., Barman, D., Faloutsos, M.: Comparison of internet traffic classification tools. In: Workshop on Application Classification and Identification (2007). http://www.icir.org/imrg/waci07/docs/waci-3-abs.pdf

  25. CoralReef: http://www.caida.org/tools/measurement/coralreef/. (2012)

  26. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev. 35(4), 229–240 (2005)

    Article  Google Scholar 

  27. Weka: data mining software in Java. (2012). http://www.cs.waikato.ac.nz/ml/weka/

  28. Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Proceedings of the 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS ’07), pp. 179–188. Washington, DC, USA (2006)

  29. John, W., Tafvelin, S.: Heuristics to classify internet backbone traffic based on connection patterns. In: Proceedings of the 22nd International Conference on Information Networking (ICOIN’08), pp. 1–5. Busan, Korea (2008)

  30. Raahemi, B., Hayajneh, A., Rabinovitch, P.: Classification of peer-to-peer traffic using neural networks. In: Artificial Intelligence and Pattern Recognition, pp. 411–417 (2007)

  31. Zhang, M., John, W., Claffy, K.C., Brownlee, N.: State of the art in traffic classification: a research review. In: Proceedings of the Tenth Passive and Active Measurement Conference (PAM’09). Seoul, Korea (2009)

  32. Zarei, R., Monemi, A., Marsono, M.: Retraining mechanism for on-line peer-to-peer traffic classification. In: intelligent Informatics, Advances in Intelligent Systems and Computing. vol. 182, pp. 373–382. Springer,Berlin Heidelberg (2013)

  33. Tcpdump: http://www.tcpdump.org/ (2012)

  34. Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: the Proceedings of Sixth Passive and Active Measurement Workshop (PAM ’05), pp. 41–54. Boston, USA (2005)

  35. Quinlan, J.R.: http://www.rulequest.com/personal/ (2012)

  36. Wang, Y., Yu, S.Z.: Machine learned real-time traffic classifiers. In: Proceedings of the 2008 Symposium on Intelligent Information Technology Application (IITA ’08), pp. 449–454. Shanghai, China (2008)

  37. Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, MineNet ’06, pp. 281–286 (2006)

Download references

Acknowledgments

The work was done when the first author was with the Faculty of Electrical Engineering, Universiti Teknologi Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roozbeh Zarei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zarei, R., Monemi, A. & Marsono, M.N. Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers. J Netw Syst Manage 23, 89–110 (2015). https://doi.org/10.1007/s10922-013-9279-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-013-9279-z

Keywords

Navigation