A Classifier Model to Detect Phishing Emails Using Ensemble Technique

Nthurima, Fredrick; Matheka, Abraham

Center for Open Access in Science (COAS)
OPEN JOURNAL FOR INFORMATION TECHNOLOGY (OJIT)
ISSN (Online) 2620-0627 * ojit@centerprode.com

OJIT Home

2023 - Volume 6 - Number 2

A Classifier Model to Detect Phishing Emails Using Ensemble Technique

Fredrick Nthurima
Kenyatta University, School of Engineering and Technology, Nairobi, KENYA

Abraham Matheka
Kenyatta University, School of Engineering and Technology, Nairobi, KENYA

Open Journal for Information Technology, 2023, 6(2), 157-172 * https://doi.org/10.32591/coas.ojit.0602.06157n
Received: 11 September 2023 ▪ Revised: 18 November 2023 ▪ Accepted: 24 December 2023

LICENCE: Creative Commons Attribution 4.0 International License.

ARTICLE (Full Text - PDF)

ABSTRACT:
Phishing attacks usually take advantage of weaknesses in the way users behave. An attacker sends an email to the recipient that mimics a genuine email with phishing links. When the recipient clicks on the embedded links, the attacker can harvest critical information like credit card numbers, usernames or passwords as a result of entering the compromised account. Online surveys have put phishing attacks as the leading attack for web content, mostly targeting financial institutions. According to a survey conducted by Ponemon Institute LLC 2017, the loss due to phishing attacks is about $1.5 billion annually. This is a global threat to information security, and it’s on the rise due to IoT (Internet of Things) and thus requires a better phishing detection mechanism to mitigate these losses and reputation injury. This research paper explores and reports the use of multiple machine learning models by using an algorithm called Random Forest and using more phishing email features to improve the accuracy of phishing detection and prevention. This project will explore the existing phishing methods, investigate the effect of combining two machine learning algorithms to detect and prevent phishing attacks, design and develop a supervised classifier to detect and prevent phishing emails and test the model with existing data. A dataset consisting of benign and phishing emails will be used to conduct supervised learning by the model. Expected accuracy is 99.9%, with a rate of less than 0.1% for False Negatives (FN) and False Positives (FP).

KEY WORDS: extractive model, abstractive model, hybrid model, natural language processing.

CORRESPONDING AUTHOR:
Fredrick Nthurima, Kenyatta University, School of Engineering, Nairobi, KENYA.

REFERENCES:

Abdelhamid, N., & Thabtah, F. (2014). Associative classification approaches: Review and comparison. Journal of Information and Knowledge Management (JIKM), 13(3).

Aburrous, M., Hossain, M., Dahal, K. P., & Thabtah, F. (2010). Experimental case studies for investigating e-banking phishing techniques and attack strategies. Journal of Cognitive Computation, 2(3), 242-253.

Afroz, S., & Greenstadt, R. (2011). PhishZoo: Detecting phishing websites by looking at them. In Fifth International Conference on Semantic Computing (18-21 September). Palo Alto, California USA, 2011. IEEE.

Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of phishing emails using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, Article ID 425731, 6 pages, 2014.

Altaher, A., Wan, T. C., & ALmomani, A., (2012). Evolving fuzzy neural network for phishing emails detection. Journal of Computer Science, 8(7).

APWG Phishing Attack Trends Reports (2018). https://www.antiphishing.org/resources/apwg-reports/.

Basnet, R., Mukkamala, S., & Sung, A. H. (2008). Detection of phishing attacks: A machine learning approach. Soft Computing Applications Industry, pp. 373-383.

Bayesian network classifiers in Weka (2004). Working paper series. University of Waikato, Department of Computer Science. No. 14/2004. Hamilton, New Zealand: University of Waikato.

Behdad, M., French, T., Bennamoun, T., & Barone, L. (2012). Nature-inspired techniques in the context of fraud detection. IEEE Transactions on Systems, Man, and Cybernetics C.

Bouckaert, R. (2004). Bayesian network classifiers in Weka (Working paper series. University of Waikato, Department of Computer Science. No. 14/2004). Hamilton, New Zealand: University.

Brown, S., Ofoghi, B., Ma, L., & Watters, P. (2017). Detecting phishing emails using hybrid features. Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing (UIC-ATC ‘17), IEEE, Australia.

Cranor, L. F., J. I. Hong, & Y. Zhang (2016). Cantina: A content-based approach to detecting phishing websites. In 16th International World Wide Web Conference (WWW '07), Canada.

Cutler, A., & Breiman, L. (2007). Random forests-classification description. Department of Statistics Homepage.

Emigh, A. (2016). Phishing attacks: information flow and chokepoints. In Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft. USA.

Fette, I., Sadeh, N., & Tomasic, A. (2017). Learning to detect phishing emails. Proceedings of the 16th international conference on the World Wide Web. 649-656.

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

Gaines, B. R., & Compton, J. P. (1995). Induction of ripple-down rules applied to modeling large databases. Intell. Inf. Syst., 5(3), 211-228.

Gupta, M., Prakash, P., Kompella, R. R., & Kumar, M. (2015). PhishNet: Predictive blacklisting to detect phishing attacks. IEEE Conference on Computer Communications.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. (2009). The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.

Han, W., Cao, Y., & Le, Y. (2015). Anti-phishing based on automated individual white-list. 4th ACM workshop on digital identity management (DIM) (pp. 51-59). ACM USA.

Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63-90.

Huber, M., Mulazzani, M., Leithner, M., Schrittwieser, S., Wondracek, G., & Weippl, E. (2011). Computer security applications. 27th Annual Computer Security Applications Conference.

Khonji, M, Jones, A., & Iraqi, Y. (2013). Phishing detection: A literature survey. IEEE Communications & Surveys Tutorials.

Ledesma, R., Chou, N., Mitchell, J. C., & Teraguchi, Y. (2014). Client-side defense against web-based identity theft. 11th Annual Network & Distributed System Security Symposium. USA.

Mitchell, T. M. (1997). Machine learning. McGraw-Hill, New York, NY, USA.

Mohammad, R., Thabtah, F., & McCluskey L. (2015B). Phishing websites dataset. Available: https://archive.ics.uci.edu/ml/datasets/Phishing+Websites. Accessed January 2016.

Mohammad, R., Thabtah F., & McCluskey L. (2014A). Predicting phishing websites based on self-structuring neural network. Journal of Neural Computing and Applications, 25(2), 443-458. ISSN 0941-0643. Springer.

Mohammad, R. M., Thabtah, F., & McCluskey, L. (2013). Predicting phishing websites using neural network trained with back-propagation. Las Vegas, World Congress in Computer Science, Computer Engineering, and Applied Computing, pp. 682-686.

Nargundkar, S., Tiruthani, N., & Yu, W. D. (2017). PhishCatch – A phishing detection tool. 33rd Annual IEEE International Computer Software and Applications Conference (COMPSAC '17), USA.

Nazif, M., Ryner, B., & Whittaker, C. (2010). Large-scale automatic classification of phishing pages. 17th Annual Network & Distributed System Security Symposium (NDSS '10). The Internet Society, USA.

Platt, J. (1998). Fast training of SVM using sequential optimization: Advances in kernel methods support vector learning. MIT Press, Cambridge, 1998, pp. 185-208.

Qabajeh I., Thabtah, F., & Chiclana, F. (2015). Dynamic classification rules data mining method. Journal of Management Analytics, 2(3), 233-253.
Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

Sadeh, N., Fette, I., & Tomasic, A. (2017). Learning to detect phishing emails. 16th International World Wide Web Conference (WWW '17). Canada.

Sheng, S., Holbrook, M., Kumaraguru, P., Cranor, L. F., & Downs, J. (2010). Who falls for phish? A demographic analysis of phishing susceptibility and effectiveness of interventions. Proceedings of the 28th international conference on human factors in computing systems - CHI ‘10, 373–382. https://doi.org/10.1145/1753326.1753383

Smadi, S., Aslam, N., Zhang, L., Alasem, R., & Hossain, M. A. (2015). Detection of phishing emails using data mining algorithms. 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

Strobel, S., Glahn, S., Moens, M. F., & Bergholz, A. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7-35.

Sung, A. H., Basnet, R., & Mukkamala, S. (2008). Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry. Germany.

Tan, C. L., Chiew, K. L., & Sze, S. N. (2017). Phishing webpage detection using weighted URL tokens for identity keywords retrieval. In Ibrahim, H., Iqbal, S., Teoh. S., & Mustaffa, M. (Eds). 9th International conference on Robotic, Vision, Signal Processing and Power Applications. Lecture Notes in Electrical Engineering. Vol. 398. Springer, Singapore.

Thabtah, F., Mohammad, R., & McCluskey, L. (2016B). A dynamic self-structuring neural network model to combat phishing. In Proceedings of the 2016 IEEE World Congress on Computational Intelligence. Vancouver, Canada.

Thabtah, F., Qabajeh, I.., & Chiclana, F. (2016A). Constrained dynamic rule induction learning. Expert Systems with Applications, 63, 74-85.

Wattenhofer, R., Burri, N., & Albrecht, K. (2015). Spamato-an extendable spam filter system. In Proceedings of the 2nd Conference on Email and Anti-Spam (CEAS '15). USA.

Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. PubMed Central.

Yuan, Y., & Zhang, N. (2012). Phishing detection using neural network. http://cs229.stanford.edu/proj2012/ZhangYuan-PhishingDetectionUsingNeuralNetwork.pdf

Zhang, Y., Cranor, L. F., Hong, J. I, & Egelman, S. (2016). Finding phish: an evaluation of anti-phishing toolbars. 14th Annual Network & Distributed System Security Symposium. USA.