Sentiment Analysis and Text Classification for Depression Detection

Iffah Nadhirah Joharee, Nik Nur Wahidah Nik Hashim, Nur Syahirah Mohd Shah


Depression is an illness that can harm someone's life. However, many people still do not know that they are having depression and tend to express their feelings through text or social media. Thus, text-based depression detection could help in identifying the early detection of the illness. Therefore, the research aims to build a depression detection that can identify possible depression cues based on Bahasa Malaysia text. The data, in the form of text, has been collected from depressed and healthy people via a google form. There are three questions asked which are “Apakah kenangan manis yang anda ingat?”, “Apakah rutin harian anda?” and “Apakah keadaan yang membuatkan anda stress?” which obtained 172, 169 and 170 responses for each question respectively. All the datasets are stored in a CSV file. Using Python, TF-IDF was extracted as the feature and pipeline into several classifier models such as Random Forest, Multinomial Naïve Bayes, and Logistic Regression. The results were presented using the classification metrics of confusion matrix, accuracy, and F1-score. Also, another method has been conducted using the text sentiment techniques Vader and Text Blob onto the datasets to identify whether depressive text falls under negative sentiment or vice versa. The percentage differences were determined between the actual sentiment compared to Vader and Text Blob sentiment. From the experiment, the highest score is achieved by AdaBoost Classifier with a 0.66-F1 score. The best model is chosen to be utilized in the Graphical User Interface (GUI).


Depression; Natural Language Processing; Machine Learning; Sentiment Analysis;

