skip to main content
10.1145/2808797.2808908acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Overcoming Data Scarcity of Twitter: Using Tweets as Bootstrap with Application to Autism-Related Topic Content Analysis

Authors Info & Claims
Published:25 August 2015Publication History

ABSTRACT

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

References

  1. Autism spectrum disorder fact sheet. American Psychiatric Publishing, 2013. 1Google ScholarGoogle Scholar
  2. F. Abel, Q. Gao, G. J. Houben, and K. Tao. Analyzing user modeling on twitter for personalized news recommendations. UMAP, 2011. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of Twitter data. LSM, 2011. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Arandjelović. Computationally efficient application of the generic shape-illumination invariant to face recognition from video. PR, 2012. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Arandjelović. Reading ancient coins: automatically identifying denarii using obverse legend seeded retrieval. ECCV, 2012. 7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Arandjelović and R. Cipolla. Achieving robust face recognition from video by combining a weak photometric model and a learnt generic face invariant. PR, 2013. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Asur and B. A. Huberman. Predicting the future with social media. WI-IAT, 2010. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Baucom, A. Sanjari, X. Liu, and M. Chen. Mirroring the real world in social media: Twitter, geolocation, and sentiment analysis. MNLP, 2013. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. J. Baxter, T. S. Brugha, H. E. Erskine, R. W. Scheurer, T. Vos, and J. G. Scott. The epidemiology and global burden of autism spectrum disorders. Psychological Medicine, 2015. 2Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Beykikhoshk, O. Arandjelović, D. Phung, and S. Venkatesh. Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature. PAKDD, 2015. 4Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Beykikhoshk, O. Arandjelović, D. Phung, S. Venkatesh, and T. Caelli. Data-mining Twitter and the autism spectrum disorder: a pilot study. ASONAM, 2014. 3, 4, 5Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Beykikhoshk, O. Arandjelović, D. Phung, S. Venkatesh, and T. Caelli. Using Twitter to learn about the autism community. SNAM, 2015. 1, 6Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Bifet and E. Frank. Sentiment knowledge discovery in Twitter streaming data. DS, 2010. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Blei and J. Lafferty. Correlated topic models. NIPS, 2006. 3Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM, 2011. 3Google ScholarGoogle Scholar
  16. J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. Reading tea leaves: how humans interpret topic models. NIPS, 2009. 3Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Chew and G. Eysenbach. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLOS ONE, 2010. 3Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions in empirical data. SIAM Review, 2009. 6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Culotta. Towards detecting influenza epidemics by analyzing Twitter messages. SOMA, 2010. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. T. Danial and J. J. Wood. Cognitive behavioral therapy for children with autism: Review and considerations for future research. Journal of Developmental & Behavioral Pediatrics, 2013. 2Google ScholarGoogle Scholar
  21. T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1973. 5Google ScholarGoogle ScholarCross RefCross Ref
  22. D. E. Gray. Perceptions of stigma: The parents of autistic children. Sociology of Health & Illness, 1993. 2Google ScholarGoogle ScholarCross RefCross Ref
  23. L. Gross. A broken trust: lessons from the vaccineautism wars. PLoS Biology, 2009. 6Google ScholarGoogle ScholarCross RefCross Ref
  24. J. W. Harrington, L. Rosen, A. Garnecho, and P. A. Patrick. Parental perceptions and use of complementary and alternative medicine practices for children with autistic spectrum disorders in private practice. Journal of Developmental & Behavioral Pediatrics, 2006. 2Google ScholarGoogle Scholar
  25. A. Harshavardhan, A. Gandhe, R. Lazarus, S. H. Yu, and B. Liu. Predicting flu trends using Twitter data. INFOCOM, 2011. 1Google ScholarGoogle Scholar
  26. N. Higashida. The reason I jump: the inner voice of a thirteen-year-old boy with autism. Random House, 2013. 7Google ScholarGoogle Scholar
  27. I. Himelboim and J. Y. Han. Cancer talk on Twitter: community structure and information sources in breast and prostate cancer social networks. Journal of Health Communication, 2014. 1Google ScholarGoogle Scholar
  28. J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in Twitter. HT, 2010. 7, 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Hutchings. Commercial use of facebook and twitterrisks and rewards. Computer Fraud Security, 2012. 1Google ScholarGoogle Scholar
  30. A. Hviid, M. Stellfeld, J. Wohlfahrt, and M. Melbye. Association between thimerosal-containing vaccine and autism. The Journal of the American Medical Association, 2003. 6Google ScholarGoogle Scholar
  31. H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 2001. 5Google ScholarGoogle Scholar
  32. J. W. Jacobson, J. A. Mulick, and G. Green. Costbenefit estimates for early intensive behavioral intervention for young children with autismgeneral model and single state case. Behavioral Interventions, 1998. 1Google ScholarGoogle Scholar
  33. J. Jashinsky, S. H. Burton, C. L. Hanson, J. West, C. Giraud-Carrier, M. D. Barnes, and A. T. Tracking suicide risk factors through Twitter in the US. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 2014. 3Google ScholarGoogle ScholarCross RefCross Ref
  34. L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent Twitter sentiment classification. ACL, 2011. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. E. Levy, D. S. Mandell, and R. T. Schultz. Autism. Lancet, 2009. 1Google ScholarGoogle ScholarCross RefCross Ref
  36. J. Li and C. Cardie. Early stage influenza detection from Twitter. arXiv preprint, 2013. 1Google ScholarGoogle Scholar
  37. J. Lin and D. Ryaboy. Scaling big data mining infrastructure: the Twitter experience. ACM SIGKDD Explorations Newsletter, 2013. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. H. Miles. Autism spectrum disorders--a genetics review. Genetics in Medicine, 2011. 1Google ScholarGoogle ScholarCross RefCross Ref
  39. L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth. The geography of happiness: connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PLOS ONE, 2013. 3Google ScholarGoogle ScholarCross RefCross Ref
  40. A. T. Newton, A. D. I. Kramer, and D. N. McIntosh. Autism online: a comparison of word usage in bloggers with and without autism spectrum disorders. SIGCHI, 2009. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. J. Paul and M. Dredze. You are what you tweet: analyzing Twitter for public health. ICWSM, 2011. 3Google ScholarGoogle Scholar
  42. M. J. Paul and M. Dredze. A model for mining public health topics from Twitter. Health, 2012. 1Google ScholarGoogle Scholar
  43. B. Robinson, R. Power, and M. Cameron. An evidence based earthquake detector using Twitter. LPCI, 2013. 3Google ScholarGoogle Scholar
  44. M. A. Russell. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O'Reilly Media, Inc., 2013. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW, 2010. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. Twitterstand: news in tweets. SIGSPATIAL GIS, 2009. 3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Scanfeld, V. Scanfeld, and E. L. Larson. Dissemination of health information through social networks: Twitter and antibiotics. American Journal of Infection Control, 2010. 3Google ScholarGoogle Scholar
  48. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Sharing clusters among related groups: hierarchical Dirichlet processes. NIPS, 2004. 5Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006. 5Google ScholarGoogle Scholar
  50. D. Trembath, S. Balandin, and C. Rossi. Crosscultural practice and autism. Journal of Intellectual and Developmental Disability, 2005. 2Google ScholarGoogle Scholar
  51. Twitter. About. https://about.twitter.com/company. (accessed April 2015). 1Google ScholarGoogle Scholar
  52. S. Verma, S. Vieweg, W. J. Corvey, L. Palen, J. H. Martin, M. Palmer, A. Schram, and K. M. Anderson. Natural language processing to the rescue? Extracting "situational awareness" tweets during mass emergency. ICWSM, 2011. 3Google ScholarGoogle Scholar
  53. Z. Warren, M. L. McPheeters, N. Sathe, J. H. Foss-Feig, A. Glasser, and J. Veenstra-VanderWeele. A systematic review of early intensive intervention for autism spectrum disorders. Pediatrics, 2011. 2Google ScholarGoogle Scholar
  1. Overcoming Data Scarcity of Twitter: Using Tweets as Bootstrap with Application to Autism-Related Topic Content Analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015
          August 2015
          835 pages
          ISBN:9781450338547
          DOI:10.1145/2808797

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 August 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate116of549submissions,21%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader