A Hybrid Ai Framework For Intelligent Spam Email Detection
Keywords:
Component; formatting; style; styling; insertAbstract
With the exponential growth of digital transmission, email remains one of the multiple widely used venues. Yet, the peak of unasked and hostile emails typically directed to as spam poses a considerable danger to data security and user aloneness. This study offers a relative analysis of classical machine learning algorithms for practical spam email detection. A real-world dataset consisting of over 5,500 email notifications marked as either spam or ham (non-spam) was employed. The dataset underwent preprocessing, including text normalization and feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF). Class inequality was handled using the Synthetic Minority Over-sampling Technique (SMOTE). Nine machine learning models Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naïve Bayes (NB), and Gradient Boosting (Boost), CNN, and Distil BERT + XGB were prepared. Model interpretation was evaluated using accuracy, precision, recall, F1-score, and AUC-ROC metrics. Among all models, CNN reached the highest accuracy and precision of 99.85%, along with 93% recall and a 96% F1 score, indicating healthy detection capability with minimal false positives. The findings suggest that ensemble-based techniques, particularly CNN and RF, offer robust and scalable solutions for real-time spam detection systems. This research contributes to the development of clever email filtering systems and offers a foundation for future advancements using deep knowledge and explainable AI.


