Explainable AI-Based Mental Health Risk Detection using Social Media Text with Hybrid NLP and Machine Learning Models
Abstract
The proliferation of psychological disorders, including depression, anxiety, and suicidal ideation, has created an urgent need for scalable, early-detection screening tools. Social media platforms provide a vast, continuous stream of user-generated content that serves as a "digital phenotype" reflecting the mental well-being of global populations. While deep learning models, particularly transformer-based architectures, have achieved state-of-the-art accuracy in psychiatric risk classification, their inherent complexity often results in a "black-box" nature that hinders clinical adoption and professional trust.
This research develops a comprehensive, explainable artificial intelligence framework for the automated detection of mental health risks in social media text. By utilizing a hybrid architecture that integrates advanced Natural Language Processing with interpretable machine learning classifiers—specifically Support Vector Machines, Random Forest, and Logistic Regression—this study addresses the critical trade-off between predictive performance and transparency. The proposed methodology leverages transformer-based contextual embeddings alongside traditional statistical features to identify granular linguistic markers of psychological distress.
To ensure clinical validity, post-hoc interpretability modules, specifically SHAP and LIME, are integrated to identify specific indicators such as self-focused language, hopelessness, and sleep disturbances. Experimental results on benchmark datasets demonstrate that the hybrid framework achieves accuracies up to 99% while providing the feature-level transparency required for ethical psychiatric screening. This study contributes a verifiable diagnostic pipeline that supports data-driven clinical decision-making and fosters the responsible application of AI in the domain of mental healthcare.
Introduction
The global mental health landscape is characterized by a widening gap between the prevalence of psychiatric disorders and the availability of professional care. Conditions such as major depressive disorder and generalized anxiety affect hundreds of millions of individuals, leading to a substantial decrease in quality of life and imposing a massive socio-economic burden on global communities. Traditional psychiatric diagnostic methods primarily rely on periodic, face-to-face clinical interviews and retrospective self-reporting questionnaires. While these methods are rigorous, they are susceptible to recall bias and often fail to capture the transient, episodic nature of psychological distress. Furthermore, factors such as social stigma and the shortage of mental health practitioners often delay diagnosis until conditions become acute.
The emergence of social media as a primary channel for personal expression has created a unique opportunity for real-time, non-intrusive mental health monitoring. Individuals frequently utilize platforms like Reddit and Twitter to articulate their thoughts, disclose emotional struggles, and seek peer support. This user-generated content constitutes a rich repository of digital signals that reflect an individual's psychological state. By applying advanced computational techniques, specifically Natural Language Processing and Machine Learning, it is possible to analyze these digital footprints at scale. Automated screening tools can identify early warning signs of distress, facilitating proactive interventions that are critical for preventing severe psychiatric outcomes, such as self-harm or suicidal ideation.
Despite the high predictive performance of modern AI, particularly deep learning models, their lack of transparency remains a primary obstacle to clinical integration. In healthcare, a risk assessment or diagnosis must be justifiable to both the practitioner and the patient. Black-box models that offer no explanation for their outputs fail to meet the ethical and accountability standards required in medical practice. Clinicians require "understandable" AI that can justify its predictions through specific behavioral or linguistic evidence aligned with established psychiatric criteria.
This research paper proposes a hybrid framework that bridges the gap between high-performance predictive analytics and clinical interpretability. By combining the semantic richness of transformer-based NLP features with the inherent transparency of traditional machine learning classifiers, we develop a system that not only predicts risk but also highlights the linguistic evidence behind each prediction. This transparency is achieved through the integration of SHAP and LIME, which map model outputs to specific features like negative valence, self-focused language, and fatigue-related markers. The following sections detail the literature review, methodology, and experimental results of this interpretable mental health detection system.


