Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry

Nilam Nur Amir Sjarif; Yazriwati Yahya; Suriayati Chuprat; Nurul Huda Firdaus Mohd Azmi

doi:10.18517/ijaseit.10.2.10175

DOI : https://doi.org/10.18517/ijaseit.10.2.10175

Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry

Nilam Nur Amir Sjarif ⁽¹⁾, Yazriwati Yahya ⁽²⁾, Suriayati Chuprat ⁽³⁾, Nurul Huda Firdaus Mohd Azmi ⁽⁴⁾

(1) Advanced Technology Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100 Kuala Lumpur , Malaysia

(2) Advanced Technology Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100 Kuala Lumpur , Malaysia

(3) Advanced Technology Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100 Kuala Lumpur , Malaysia

(4) Advanced Technology Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100 Kuala Lumpur , Malaysia

Fulltext View | Download

How to cite (IJASEIT) :

Amir Sjarif, Nilam Nur, et al. “Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 2, Apr. 2020, pp. 635-9, doi:10.18517/ijaseit.10.2.10175.

Citation Format :

In recent years, we have withnessed a dramatic increment volume in the number of mobile users grows in telecommunication industry. However, this leads to drastic increase to the number of spam SMS messages. Short Message Service (SMS) is considered one of the widely used communication in telecommunication service. In reality, most of the users ignore the spam because of the lower rate of SMS and limited amount of spam classification tools. In this paper, we propose a Support Vector Machine (SVM) algorithm for SMS Spam Classification. Support Vector Machine is considered as the one of the most effective for data mining techniques. The propose algorithm have been evaluated using public dataset from UCI machine learning repository. The performance achieved is compared with other three data mining techniques such as Naí¯ve Bayes, Multinominal Naí¯ve Bayes and K-Nearest Neighbor with the different number of K= 1,3 and 5. Based on the measuring factors like higher accuracy, less processing time, highest kappa statistics, low error and the lowest false positive instance, it’s been identified that Support Vector Machines (SVM) outperforms better than other classifiers and it is the most accurate classifier to detect and label the spam messages with an average an accuracy is 98.9%. Comparing both the error parameter overall, the highest error has been found on the algorithm KNN with K=3 and K=5. Whereas the model with less error is SVM followed by Multinominal Naí¯ve Bayes. Therefore, this propose method can be used as a best baseline for further comparison based on SMS spam classification.

T. a Almeida, J. María, G. Hidalgo, and T. P. Silva, “Towards SMS Spam Filtering: Results under a New Dataset,” Int. J. Inf. Secur. Sci. T., vol. 2, no. 1, pp. 1-18, 2012.

Choudhary, N., & Jain, A. K. "Towards Filtering of SMS Spam Messages Using Machine Learning Technique". Advanced Informatics for Computing Research, vol 712, pp. 18-30, 2017 https://doi.org/10.1007/978-981-10-5780-9.

Pham, T.H., Le-Hong, P. "Content-based approach for Vietnamese spam SMS filtering", in: 2016 International Conference on Asian Language Processing (IALP), pp. 41-44, 2016

Bank Negara Malaysia. "Alert on SMS Scam and Fake Website Involving Bank Negara Malaysia Name". [Online]. Available: http://www.bnm.gov.my/index.php?ch=en_announcement&pg=en_announcement&ac=536. 2017

Davenport, J.R.A., DeLine, R., "The Readability of Tweets and their Geographic Correlation with Education" https://arxiv.org/abs/1401.6058. 2014

Dermawan, A., "Accountant loses RM510,000 to “Bank Negara” scammers". News Straits Time. October 19, 2017. 2017

Kaya, Y., & Faruk, í–. "A novel feature extraction approach in SMS spam filtering for mobile communication: one-dimensional ternary patterns". Security and Communication Networks, vol. 9 no.17, pp.4680-4690, 2016

Abdulhamid, S.M., Latiff, M.S.A., Chiroma, H., Osho, O., Abdul-Salaam, G., Bakar, A.A., Herawan, T., "A Review on Mobile SMS Spam Filtering Techniques". IEEE Access pp. 1-1, 2017

Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., "Data Mining: Practical machine learning tools and techniques". Morgan Kaufmann. 2016.

R. Article, H. Sajedi, G. Z. Parast, and F. Akbari, “SMS Spam Filtering Using Machine Learning Techniques: A Survey,” Mach. Learn. Res., vol. 1, no. 1, pp. 1-14, 2016.

P. Chhabra, R. Wadhvani, and S. Shukla, “Spam Filtering using Support Vector Machine,” vol. 1, no. 2, pp. 3-5, 2010.

Polytechnic, S., & Region, K. "SMS Spam Detection Using Association Rule". Journal of Theoretical and Applied Information Technology, vol. 96, no.12, pp. 3962-3972, 2018.

H. Najadat, N. Abdulla, R. Abooraig, and S. Nawasrah, “Mobile SMS Spam Filtering based on Mixing Classifiers,” Int. J. Adv. Comput. Res., vol. 1, pp. 1-7, 2014.

T. a Almeida, J. M. G. Hidalgo, and A. Yamakami, “Contributions to the study of SMS spam filtering: new collection and results,” Proc. 11th ACM Symp. Doc. Eng., pp. 259-262, 2011.

J. M. Gómez Hidalgo, G. C. Bringas, E. P. Sí¡nz, and F. C. García, “Content based SMS spam filtering,” Proc. 2006 ACM Symp. Doc. Eng. - DocEng ’06, no. January, p. 107, 2006.

G. V. Cormack, J. M. G. Hidalgo, and E. P. Sí¡nz, “Feature engineering for mobile (SMS) spam filtering,” Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. - SIGIR ’07, pp. 871, 2007.

S. J. Delany, M. Buckley, and D. Greene, “SMS spam filtering: Methods and data,” Expert Syst. Appl., vol. 39, no. 10, pp. 9899-9908, 2012.

N. Chaudhari, P. Jayvala, and P. Vinitashah, “Survey on Spam SMS filtering using Data mining Techniques,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), vol. 5, no. 11, pp. 193-195, 2016.

Zainal, K., Sulaiman, N. F., & Jali, M. Z. "An Analysis of Various Algorithms for Text Spam Classification and Clustering Using RapidMiner and Weka". International Journal of Computer Science and Information Security (IJCSIS), vol. 13, no 3, pp. 66-74, 2015.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution LicenseÂ that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).