A Hybrid Ai Framework For Intelligent Spam Email Detection

Didar Hussain; Sefat Ghayoor Khan; Abdullah Sadiq; Waqas Khan; Azaz Ali; Shams ul Arifeen; Adnan Khan; Muhammad Awais

Authors

Didar Hussain Department of Computer Science, Abdul Wali Khan University Mardan, KPK, Pakistan.
Sefat Ghayoor Khan Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.
Abdullah Sadiq Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.
Waqas Khan Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.
Azaz Ali Department of Computer Science, Abasyn University Peshawar, KPK, Pakistan.
Shams ul Arifeen Department of Computer Science, Abdul Wali Khan University Mardan, KPK, Pakistan.
Adnan Khan Department of Computer Science, Abasyn University Peshawar, KPK, Pakistan.
Muhammad Awais Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.

Keywords:

Component; formatting; style; styling; insert

Abstract

With the exponential growth of digital transmission, email remains one of the multiple widely used venues. Yet, the peak of unasked and hostile emails typically directed to as spam poses a considerable danger to data security and user aloneness. This study offers a relative analysis of classical machine learning algorithms for practical spam email detection. A real-world dataset consisting of over 5,500 email notifications marked as either spam or ham (non-spam) was employed. The dataset underwent preprocessing, including text normalization and feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF). Class inequality was handled using the Synthetic Minority Over-sampling Technique (SMOTE). Nine machine learning models Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naïve Bayes (NB), and Gradient Boosting (Boost), CNN, and Distil BERT + XGB were prepared. Model interpretation was evaluated using accuracy, precision, recall, F1-score, and AUC-ROC metrics. Among all models, CNN reached the highest accuracy and precision of 99.85%, along with 93% recall and a 96% F1 score, indicating healthy detection capability with minimal false positives. The findings suggest that ensemble-based techniques, particularly CNN and RF, offer robust and scalable solutions for real-time spam detection systems. This research contributes to the development of clever email filtering systems and offers a foundation for future advancements using deep knowledge and explainable AI.

A Hybrid Ai Framework For Intelligent Spam Email Detection

Authors

Keywords:

Abstract

Author Biographies

Sefat Ghayoor Khan, Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.

Abdullah Sadiq, Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.

Waqas Khan, Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.

Azaz Ali, Department of Computer Science, Abasyn University Peshawar, KPK, Pakistan.

Shams ul Arifeen, Department of Computer Science, Abdul Wali Khan University Mardan, KPK, Pakistan.

Adnan Khan, Department of Computer Science, Abasyn University Peshawar, KPK, Pakistan.

Muhammad Awais, Department of Computer Science, Bacha Khan University Charsadda, KPK, Pakistan.

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

Journal Information

Indexing

Flag Counter