Machine Learning Approaches for Anomaly Detection in Complex Systems
https://doi.org/10.5281/zenodo.19465217
Keywords:
Anomaly Detection; Machine Learning; Complex Systems; Autoencoder; Xgboost; Auc-Roc; Statistical ComparisonAbstract
Detection of anomalies in complex systems has been challenging due to high dimensionality, nonlinear relationships, and extreme class imbalance. The study has conducted comparative, quantitative assessments of classical machine learning, deep learning, and supervised techniques for detecting anomalies using multivariate system data. The five most popular models, including Isolation Forest, One-Class Support Vector Machine, Local Outlier Factor, an autoencoder-based neural network, and Extreme Gradient Boosting (XGBoost), were systematically evaluated. Accuracy, precision, recall, F1-score, area under the receiver operating characteristic (ROC) curve (AUC-ROC), and false-positive and false-negative rates were used to evaluate model performance. Overall, the best performance was demonstrated by the supervised XGBoost model, which achieved 97.3% accuracy, an F1-score of 0.83, and an AUC-ROC of 0.96, with the lowest false-negative rate (3.1%). The auto-encoder, as one of these unsupervised methods, outperformed classical methods with a score of 95.6, F1-score of 0.75, AUC-ROC of 0.92, and equal error rates (false-positive rate: 4.8; false-negative rate: 4.4). Isolation Forest performed with moderate precision (AUC-ROC: 0.89), with One-Class SVM and the Local Outlier Factor having lower recall and a higher error rate. Statistical comparisons of AUC-ROC values using pairwise statistics revealed that the XGBoost and the auto-encoder were significantly better than both One-Class SVM and the Local outlier factor (p < 0.05). However, they were not significantly different from each other (p = 0.07). Generally, the findings quantitatively demonstrate the benefits of supervised learning when labeled data are available and underscore the success of deep autoencoder-based algorithms in unsupervised anomaly detection in complex systems.


