Skip to main content

Advertisement

Log in

Using linguistic and topic analysis to classify sub-groups of online depression communities

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.livejournal.com/interests.bml

  2. http://www.liwc.net/descriptiontable1.php,retrievedJanuary2015.

  3. All 50 topics learned from the corpus by LDA are placed at http://bit.ly/1JKY2vo

References

  1. Arguello J, Butler BS, Joyce E, Kraut R, Ling KS, Carolyn R, Wang X (2006) Talk to me: Foundations for successful individual−group interactions in online communities. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp 959–968

  2. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993– 1022

    MATH  Google Scholar 

  3. Chang X, Nie F, Yi Y, Huang H (2014) A convex formulation for semi−supervised multi−label feature selection. In: Proceedings of AAAI conference on artificial intelligence, pp 1171–1177

  4. Chang X, Yi Y, Xing E, Yaoliang Y (2015) Complex event detection using semantic saliency and nearly−isotonic SVM. In: Proceedings of the International Conference on Machine Learning, pp 1348–1357

  5. Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2015) Compound rank−k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems PP(99):1–1

    Google Scholar 

  6. Chen L−S, Eaton WW, Gallo JJ, Gerald N (2000) Understanding the heterogeneity of depression through the triad of symptoms, course and risk factors: A longitudinal, population−based study. J Affect Disord 59(1):1–11

    Article  Google Scholar 

  7. Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60

  8. Coppersmith G, Harman C, Dredze M (2014) Measuring post traumatic stress disorder in Twitter. In: Proceedings of International AAAI conference on weblogs and social media

  9. Coppersmith G, Dredze M, Harman C, Hollingshead K (2015) From ADHD to SAD: Analyzing the language of mental health on Twitter through self−reported diagnoses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

  10. Cruwys T, Haslam SA, Dingle GA, Haslam C, Jetten J, Depression and social identity: An integrative review (2014). In: Personality and Social Psychology Review

  11. Culotta A (2014) Estimating county health statistics with Twitter. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1335–1344

  12. Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Comm 71:10–49

    Article  Google Scholar 

  13. De Choudhury M, Counts S, Horvitz E (2013) Major life changes and behavioral markers in social media: Case of childbirth. In: Proceedings of conference on computer supported cooperative work, pp 1431–1442

  14. De Choudhury M, Counts S, Horvitz E (2013) Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 3267–3276

  15. De Choudhury M, Morris MR, White RW (2014) Seeking and sharing health information online: Comparing search engines and social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1365–1376

  16. De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings of international AAAI conference on weblogs and social media

  17. Eggly S, Manning MA, Slatcher RB, Berg RA, Wessel DL, Newth CJL, Shanley TP, Harrison R, Dalton H, Dean MJ, Doctor A, Jenkins T, Meert KL (2014) Language analysis as a window to bereaved parents’ emotions during a parent–physician bereavement meeting. J Lang Soc Psychol

  18. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1

    Article  Google Scholar 

  19. George DR, Dellasega C, Whitehead MM, Bordon A (2013) Facebook−based stress management resources for first−year medical students: A multi−method evaluation. Comput Hum Behav 29(3):559–562

    Article  Google Scholar 

  20. Giles J (2012) Making the links. Nature 488(7412):448–450

    Article  Google Scholar 

  21. Goldberg D (2011) The heterogeneity of “major depression”. World Psychiatry 10(3):226–228

    Article  Google Scholar 

  22. Grajales F.J III, Sheps S, Ho K, Novak−Lauscher H, Eysenbach G (2014) Social media: A review and tutorial of applications in medicine and health care. J Med Internet Res 16(2):e13

    Article  Google Scholar 

  23. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(90001):5228–5235

    Article  Google Scholar 

  24. Hollenbaugh EE (2011) Motives for maintaining personal journal blogs. Cyberpsychology, Behavior, and Social Networking 14(1−2):13–20

    Article  Google Scholar 

  25. Houston TK, Cooper LA, Ford DE (2002) Internet support groups for depression: A 1−year prospective cohort study. Am J Psychiatr 159(12):2062–2068

    Article  Google Scholar 

  26. Johnson GJ, Ambrose PJ (2006) Neo−tribes: The power and potential of online communities in health care. Commun ACM 49(1):107–113

    Article  Google Scholar 

  27. Jeong YS, Nhi−Ha T, Shyu I, Chang T, Fava M, Kvedar J, Yeung A (2013) Using online social media: Facebook, in screening for major depressive disorder among college students. Int J Clin Health Psychol 13(1):74–80

    Article  Google Scholar 

  28. Kessler RC, Heeringa S, Lakoma MD, Petukhova M, Rupp AE, Schoenbaum M, Wang PS, Zaslavsky AM (2008) The individual−level and societal−level effects of mental disorders on earnings in the United States: Results from the national comorbidity survey replication. Am J Psychiatry 165(6):703–711

    Article  Google Scholar 

  29. Klonsky DE, Oltmanns TF, Turkheimer E (2003) Deliberate self−harm in a nonclinical population: Prevalence and psychological correlates. Am J Psychiatr 160 (8):1501–1508

    Article  Google Scholar 

  30. Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H (2015) We feel: Mapping emotion on Twitter. IEEE Journal of Biomedical and Health Informatics 19(4):1246–1252

    Article  Google Scholar 

  31. Laserna CM, Seih Y−T, Pennebaker J.W (2014) Um... who like says you know: Filler word use as a function of age, gender, and personality

  32. McDaniel BT, Coyne SM, Holmes EK (2012) New mothers and media use: Associations between blogging, social networking, and maternal well−being. Matern Child Health J 16(7):1509–1517

    Article  Google Scholar 

  33. Moreno MA, Jelenchick LA, Egan KG, Cox E, Young H, Gannon KE, Tara B (2011) Feeling bad on Facebook: Depression disclosures by college students on a social networking site. Depress Anxiety 28(6):447–455

    Article  Google Scholar 

  34. Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72(7):580–587

    Article  Google Scholar 

  35. Nguyen T, Phung D, Bo D, Venkatesh S, Berk M (2014) Affective and content analysis of online depression communities. IEEE Trans Affect Comput 5 (3):1949–3045

    Article  Google Scholar 

  36. Nguyen T, Duong T, Venkatesh S, Phung D (2015) Austism blogs: Expressed emotion, language styles and concerns in personal and community settings. IEEE Trans Affect Comput 6(3):312–323

    Article  Google Scholar 

  37. Nguyen T, O’Dea B, Larsen M, Phung D, Venkatesh S, Christensen H (2015) Differentiating sub−groups of online depression−related communities using textual cues. In: Proceedings of web information systems engineering conference. Springer, pp 216–224

  38. Nie L, Li T, Akbari M, Shen J, Chua T−S (2014) Wenzher: Comprehensive vertical search for healthcare domain. In: Proceedings of International ACM conference on research & development in information retrieval, pp 1245–1246

  39. Nie L, Zhao Y−L, Akbari M, Shen J, Chua T−S (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409

    Article  Google Scholar 

  40. O’Dea B, Wan S, Batterham P.J, Calear A.L, Paris C, Christensen H (2015) Detecting suicidality on Twitter. Internet Interventions 2(2):183–188

    Article  Google Scholar 

  41. Park M, McDonald D, Meeyoung C (2013) Perception differences between the depressed and non−depressed users in Twitter. In: Proceedings of AAAI International conference on weblogs and social media

  42. Parker G, McCraw S, Paterson A (2015) Clinical features distinguishing grief from depressive episodes: A qualitative analysis. J Affect Disord 176:43–47

    Article  Google Scholar 

  43. Patrick K, Sheehan J, Bietz M, Gregory J, Claffey M, Calvert S, Melichar L, Downs S (2013) Gaining insight from patient & person−generated real world/real time data. In Medicine 2:0

    Google Scholar 

  44. Paul MJ, Dredze M (2014) Discovering health topics in social media using topic models. PLoS One 9(8):e103408

    Article  Google Scholar 

  45. Pennebaker JW, Francis ME, Booth RJ (2007) Linguistic Inquiry and Word Count (LIWC) [Computer software]. LIWC Inc

  46. Powell J, McCarthy N, Eysenbach G (2003) Cross−sectional survey of users of internet depression communities. BMC Psychiatry 3(1):19

    Article  Google Scholar 

  47. Preotiuc−Pietro D, Eichstaedt J, Park G, Sap M, Smith L, Tobolsky V, Schwartz HA, Ungar L (2015) The role of personality, age and gender in tweeting about mental illnesses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

  48. Ramirez−Esparza N, Chung CK, Kacewicz E, Pennebaker JW (2008) The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. In: Proceedings of AAAI International Conference on Weblogs and Social Media, pp 102–108

  49. Rodriguez AJ, Holleran SE, Matthias RM (2010) Reading between the lines: The lay assessment of subclinical depression from written self−descriptions. J Pers 78 (2):575–598

    Article  Google Scholar 

  50. Rude S, Gortner E−M, Pennebaker J (2004) Language use of depressed and depression−vulnerable college students. Cognition & Emotion 18(8):1121–1133

    Article  Google Scholar 

  51. Schwartz H, Eichstaedt J, Kern M, Dziurzynski L, Lucas R, Agrawal M, Park G, Lakshmikanth S, Jha S, Seligman M, Ungar L (2013) Characterizing geographic variation in well−being using tweets. In: Proceedings of International AAAI Conference on Weblogs and Social Media

  52. Song X, Nie L, Zhang L, Akbari M, Chua T−S (2015) Multiple social network learning and its application in volunteerism tendency prediction. In: Proceedings of International ACM Conference on Research & Development in Information Retrieval, pp 213–222

  53. Song X, Nie L, Zhang L, Liu M, Chua T−S (2015) Interest inference via structure−constrained multi−source multi−task learning. In: Proceedings of International Joint Conference on Artificial Intelligence. AAAI Press, pp 2371–2377

  54. Stirman SW, Pennebaker JW (2001) Word use in the poetry of suicidal and nonsuicidal poets. Psychosom Med 63(4):517–522

    Article  Google Scholar 

  55. Tsuya A, Sugawara Y, Tanaka A, Narimatsu H (2014) Do cancer patients tweet? Examining the Twitter use of cancer patients in Japan. J Med Internet Res 16 (5):e137

    Article  Google Scholar 

  56. Van der Maaten L, Hinton G (2008) Visualizing data using t−SNE. J Mach Learn Res 9(2579−2605):85

    MATH  Google Scholar 

  57. Vinod Vydiswaran VG, Yang L, Kai Z, Hanauer DA, Qiaozhu M (2014) User−created groups in health forums: What makes them special?. In: Proceedings of International AAAI Conference on Weblogs and Social Media

  58. Volkova S, Bacharach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of Twenty−Ninth Conference on Artificial Intelligence

  59. Wang PS, Angermeyer M, Borges G, Bruffaerts R, Chiu WT, Girolamo GD, Fayyad J, Gureje O, Haro JM, Huang Y (2007) Delay and failure in treatment seeking after first onset of mental disorders in the World Health Organization’s World Mental Health Survey Initiative. World Psychiatry 6(3):177

    Google Scholar 

  60. Wang S, Chang X, Li X, Sheng QZ , Chen W (2014) Multi−task support vector machines for feature selection with shared knowledge discovery. Signal Process

  61. Waxer PH (1976) Nonverbal cues for depth of depression: Set versus no set. J Consult Clin Psychol 4(3):493

    Article  Google Scholar 

  62. World Health Organization (2009) Global health risks: Mortality and burden of disease attributable to selected major risks

  63. Yan Y, Liu G, Ricci E, Sebe N (2013) Multi−task linear discriminant analysis for multi−view action recognition. In: Proceedings of IEEE International conference on image processing, pp 2842–2846

  64. Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: Flexible graph−guided multi−task learning for multi−view head pose classification under target motion. In: Proceedings of IEEE International Conference on Computer Vision, pp 1177–1184

  65. Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thin Nguyen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T., O’Dea, B., Larsen, M. et al. Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimed Tools Appl 76, 10653–10676 (2017). https://doi.org/10.1007/s11042-015-3128-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3128-x

Keywords

Navigation