Phishing Attack Detection Using Machine Learning

Olaniyi. A. Ayeni; Hope O. Akinyemi

doi:https://doi.org/10.59232/CYS-V2I3P102

Research Article | Open Access | Download Full Text

Volume 2 | Issue 3 | Year 2024 | Article Id: CYS-V2I3P102 DOI: https://doi.org/10.59232/CYS-V2I3P102

Phishing Attack Detection Using Machine Learning

Olaniyi. A. Ayeni, Hope O. Akinyemi

Received	Revised	Accepted	Published
15 Jul 2024	20 Aug 2024	18 Sep 2024	30 Sep 2024

Citation

Olaniyi. A. Ayeni, Hope O. Akinyemi. “Phishing Attack Detection Using Machine Learning.” DS Journal of Cyber Security, vol. 2, no. 3, pp. 15-25, 2024.

Abstract

Phishing acquires sensitive information, like login credentials, i.e. usernames, passwords or other security tokens, card information, etc, from users. In most cases, Phishing does not require high technicality, making it the third most performed attack according to multiple sources. Despite many solutions being proposed to completely eradicate or at least mitigate about 80% of Phishing attacks, Phishing continues to be prevalent. Vishakha P.R. & Sahil S.J. (2020) were unable to detect web pages automatically, and the systems were limited to system applications alone. Therefore, this work aims to design a system for detecting phishing attacks, Implement the design, and evaluate the algorithm’s performance. Jupyter Notebook was the medium used for analysis. The two algorithms used were two-class logistic detection and Naïve Bayes. The dataset (phishing.csv) obtained from Kaggle was divided into two datasets to train and test in the ratio 80:20. The result of two class logistic regression was 94.2% compared to 85% test accuracy by the Naïve Bayes algorithm.

Keywords

Phishing detection, Naïve Bayes, Machine Learning, Logistic regression, Cyber production.

References

[1] Sadia Afroz, and Rachel Greenstadt, “Phishzoo: Detecting Phishing Websites by Looking at Them,” 2011 IEEE Fifth International Conference on Semantic Computing, Palo Alto, USA, pp. 368-375, 2011.

[CrossRef] [Google Scholar] [Publisher Link]

[2] Olaniyi Abiodun Ayeni, “A Supervised Machine Learning Algorithm for Detecting Malware,” Journal of Internet Technology and Secured Transactions, vol. 10, no.1, pp. 764-769, 2022.

[CrossRef] [Google Scholar] [Publisher Link]

[3] Ram Basnet, Srinivas Mukkamala, and Andrew H. Sung, “Detection of Phishing Attacks: A Machine Learning Approach,” Soft Computing Applications in Industry, vol. 226, pp. 373-383, 2008.

[CrossRef] [Google Scholar] [Publisher Link]

[4] Ekaba Bisong, Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, 1st ed., Apress Berkeley, CA, pp. 243-250, 2019.

[CrossRef] [Google Scholar] [Publisher Link]

[5] Yan Chen, “Trust Calibration of Automated Security IT Artifacts: A Multi-Domain Study of Phishing-Website Detection Tools,” Information & Management, vol. 58, no. 1, 2021.

[CrossRef] [Google Scholar] [Publisher Link]

[6] Juan Chen, and Chuanxiong Guo, “Online Detection and Prevention of Phishing Attacks,” 2006 First International Conference on Communications and Networking in China, Beijing, China, pp. 1-7, 2006.

[CrossRef] [Google Scholar] [Publisher Link]

[7] Sujata Garera, “A Framework for Detection and Measurement of Phishing Attacks,” WORM '07: Proceedings of the 2007 ACM workshop on Recurring Malcode, pp. 1-8, 2007.

[CrossRef] [Google Scholar] [Publisher Link]

[8] N. Swapna Goud, and Anjali Mathur, “Feature Engineering Framework to Detect Phishing Websites Using URL Analysis,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, pp. 295-303, 2021.

[Google Scholar]

[9] Trevor Hastie, Jerome Friedman, and Robert Tibshirani, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” 1^st ed., Springer New York, 2001.

[CrossRef] [Google Scholar] [Publisher Link]

[10] Mohith Gowda H.R. et al., “Development of Anti-Phishing Browser Based on Random Forest and Rule of Extraction Framework,” Cybersecurity, vol. 3, 2020.

[CrossRef] [Google Scholar] [Publisher Link]

[11] Mazharul Islam, and Nihad Karim Chowdhury, “Phishing Websites Detection Using Machine Learning-Based Classification Techniques,” International Conference on Advanced Information and Communication Technology, Chittagong, Bangladesh, vol. 10, no. 9, pp. 4393-4402, 2016.

[Google Scholar]

[12] Ankit Kumar Jain, and B.B. Gupta, “Towards Detection of Phishing Websites on Client-Side Using Machine Learning-Based Approach,” Telecommunication Systems, vol. 68, pp. 687-700, 2018.

[CrossRef] [Google Scholar] [Publisher Link]

[13] Markus Jakobsson, “Modeling and Preventing Phishing Attacks,” Financial Cryptography and Data Security, vol. 5, pp. 1-19, 2005.

[CrossRef] [Google Scholar] [Publisher Link]

[14] M.I. Jordan, and T.M. Mitchell, “Machine Learning: Trends, Perspectives, and Prospects,” Science, vol. 349, no. 6245, pp. 255-260, 2015.

[CrossRef] [Google Scholar] [Publisher Link]

[15] R. Kiruthiga, and D. Akila, “Phishing Websites Detection Using Machine Learning,” International Journal of Recent Technology and Engineering, vol. 8, no. 2S11, pp. 111-114, 2019.

[CrossRef] [Google Scholar] [Publisher Link]

[16] Arun D. Kulkarni, and Leonard L. III Brown, “Phishing Websites Detection Using Machine Learning,” Computer Science Faculty Publications and Presentations, pp. 8-13, 2019.

[Google Scholar] [Publisher Link]

[17] Niklas Lavesson, and Paul Davidsson, “Evaluating Learning Algorithms and Classifiers,” International Journal of Intelligent Information and Database Systems, vol. 1, no. 1, pp. 37-52, 2007.

[CrossRef] [Google Scholar] [Publisher Link]

[18] Scott Menard, Applied Logistic Regression Analysis, 2^nd ed., Sage Publications, New Delhi, India, 2022.

[Google Scholar]

[19] H.R. Mohith Gowda et al., “Development of Anti-Phishing Browser Based on Random Forest and Rule of Extraction Framework,” Cybersecurity, vol. 3, pp. 1-14, 2020.

[CrossRef] [Google Scholar] [Publisher Link]

[20] J.R. Quinlan, “Learning Decision Tree Classifiers,” ACM Computing Surveys, vol. 28, no. 1, pp. 71-72, 1996.

[CrossRef] [Google Scholar] [Publisher Link]

[21] Jiangtao Ren et al., “Naive Bayes Classification of Uncertain Data,” 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, pp. 944-949, 2009.

[CrossRef] [Google Scholar] [Publisher Link]

[22] S.R. Safavian, and D. Landgrebe, “A Survey of Decision Tree Classifier Methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, 1991.

[CrossRef] [Google Scholar] [Publisher Link]

[23] Yan-yan Song, and Ying Lu, “Decision Tree Methods: Applications for Classification and Prediction,” Shanghai Archives of Psychiatry, vol. 27, no. 2, pp. 130-135, 2015.

[CrossRef] [Google Scholar] [Publisher Link]

[24] Mahajan Mayuri Vilas et al., “Detection of Phishing Website Using Machine Learning Approach,” 2019 4^th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques, Mysuru, India, pp. 384-389, 2019.

[CrossRef] [Google Scholar] [Publisher Link]

[25] Vishakha Prashant Ratnaparkhi, and Sahil Siddharth Jambhulkar, “Framework for Detection and Prevention of Phishing Website Using Machine Learning Approach,” Computer Science, 2020.

[26] Kiri Wagstaff, “Machine Learning that Matters,” arXiv, 2012.

[CrossRef] [Google Scholar] [Publisher Link]

[27] Colin Whittaker, Brian Ryner, and Marria Nazif, “Large-Scale Automatic Classification of Phishing Pages,” Network and Distributed System Security (NDSS) Symposium, pp. 1-14, 2010.

[Google Scholar]

[28] Zhongheng Zhang, “Naïve Bayes Classification in R,” Annals of Translational Medicine, vol. 4, no. 12, pp. 1-5, 2016.

[CrossRef] [Google Scholar] [Publisher Link]