ABSTRACT
Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.
- Autism spectrum disorder fact sheet. American Psychiatric Publishing, 2013. 1Google Scholar
- F. Abel, Q. Gao, G. J. Houben, and K. Tao. Analyzing user modeling on twitter for personalized news recommendations. UMAP, 2011. 3 Google ScholarDigital Library
- A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of Twitter data. LSM, 2011. 3 Google ScholarDigital Library
- O. Arandjelović. Computationally efficient application of the generic shape-illumination invariant to face recognition from video. PR, 2012. 3 Google ScholarDigital Library
- O. Arandjelović. Reading ancient coins: automatically identifying denarii using obverse legend seeded retrieval. ECCV, 2012. 7 Google ScholarDigital Library
- O. Arandjelović and R. Cipolla. Achieving robust face recognition from video by combining a weak photometric model and a learnt generic face invariant. PR, 2013. 3 Google ScholarDigital Library
- S. Asur and B. A. Huberman. Predicting the future with social media. WI-IAT, 2010. 1 Google ScholarDigital Library
- E. Baucom, A. Sanjari, X. Liu, and M. Chen. Mirroring the real world in social media: Twitter, geolocation, and sentiment analysis. MNLP, 2013. 1 Google ScholarDigital Library
- A. J. Baxter, T. S. Brugha, H. E. Erskine, R. W. Scheurer, T. Vos, and J. G. Scott. The epidemiology and global burden of autism spectrum disorders. Psychological Medicine, 2015. 2Google ScholarCross Ref
- A. Beykikhoshk, O. Arandjelović, D. Phung, and S. Venkatesh. Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature. PAKDD, 2015. 4Google ScholarCross Ref
- A. Beykikhoshk, O. Arandjelović, D. Phung, S. Venkatesh, and T. Caelli. Data-mining Twitter and the autism spectrum disorder: a pilot study. ASONAM, 2014. 3, 4, 5Google ScholarCross Ref
- A. Beykikhoshk, O. Arandjelović, D. Phung, S. Venkatesh, and T. Caelli. Using Twitter to learn about the autism community. SNAM, 2015. 1, 6Google ScholarCross Ref
- A. Bifet and E. Frank. Sentiment knowledge discovery in Twitter streaming data. DS, 2010. 3 Google ScholarDigital Library
- D. Blei and J. Lafferty. Correlated topic models. NIPS, 2006. 3Google ScholarDigital Library
- J. Bollen, H. Mao, and A. Pepe. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM, 2011. 3Google Scholar
- J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. Reading tea leaves: how humans interpret topic models. NIPS, 2009. 3Google ScholarDigital Library
- C. Chew and G. Eysenbach. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLOS ONE, 2010. 3Google ScholarCross Ref
- A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions in empirical data. SIAM Review, 2009. 6 Google ScholarDigital Library
- A. Culotta. Towards detecting influenza epidemics by analyzing Twitter messages. SOMA, 2010. 1 Google ScholarDigital Library
- J. T. Danial and J. J. Wood. Cognitive behavioral therapy for children with autism: Review and considerations for future research. Journal of Developmental & Behavioral Pediatrics, 2013. 2Google Scholar
- T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1973. 5Google ScholarCross Ref
- D. E. Gray. Perceptions of stigma: The parents of autistic children. Sociology of Health & Illness, 1993. 2Google ScholarCross Ref
- L. Gross. A broken trust: lessons from the vaccineautism wars. PLoS Biology, 2009. 6Google ScholarCross Ref
- J. W. Harrington, L. Rosen, A. Garnecho, and P. A. Patrick. Parental perceptions and use of complementary and alternative medicine practices for children with autistic spectrum disorders in private practice. Journal of Developmental & Behavioral Pediatrics, 2006. 2Google Scholar
- A. Harshavardhan, A. Gandhe, R. Lazarus, S. H. Yu, and B. Liu. Predicting flu trends using Twitter data. INFOCOM, 2011. 1Google Scholar
- N. Higashida. The reason I jump: the inner voice of a thirteen-year-old boy with autism. Random House, 2013. 7Google Scholar
- I. Himelboim and J. Y. Han. Cancer talk on Twitter: community structure and information sources in breast and prostate cancer social networks. Journal of Health Communication, 2014. 1Google Scholar
- J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in Twitter. HT, 2010. 7, 8 Google ScholarDigital Library
- C. Hutchings. Commercial use of facebook and twitterrisks and rewards. Computer Fraud Security, 2012. 1Google Scholar
- A. Hviid, M. Stellfeld, J. Wohlfahrt, and M. Melbye. Association between thimerosal-containing vaccine and autism. The Journal of the American Medical Association, 2003. 6Google Scholar
- H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 2001. 5Google Scholar
- J. W. Jacobson, J. A. Mulick, and G. Green. Costbenefit estimates for early intensive behavioral intervention for young children with autismgeneral model and single state case. Behavioral Interventions, 1998. 1Google Scholar
- J. Jashinsky, S. H. Burton, C. L. Hanson, J. West, C. Giraud-Carrier, M. D. Barnes, and A. T. Tracking suicide risk factors through Twitter in the US. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 2014. 3Google ScholarCross Ref
- L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent Twitter sentiment classification. ACL, 2011. 3 Google ScholarDigital Library
- S. E. Levy, D. S. Mandell, and R. T. Schultz. Autism. Lancet, 2009. 1Google ScholarCross Ref
- J. Li and C. Cardie. Early stage influenza detection from Twitter. arXiv preprint, 2013. 1Google Scholar
- J. Lin and D. Ryaboy. Scaling big data mining infrastructure: the Twitter experience. ACM SIGKDD Explorations Newsletter, 2013. 1 Google ScholarDigital Library
- J. H. Miles. Autism spectrum disorders--a genetics review. Genetics in Medicine, 2011. 1Google ScholarCross Ref
- L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth. The geography of happiness: connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PLOS ONE, 2013. 3Google ScholarCross Ref
- A. T. Newton, A. D. I. Kramer, and D. N. McIntosh. Autism online: a comparison of word usage in bloggers with and without autism spectrum disorders. SIGCHI, 2009. 3 Google ScholarDigital Library
- M. J. Paul and M. Dredze. You are what you tweet: analyzing Twitter for public health. ICWSM, 2011. 3Google Scholar
- M. J. Paul and M. Dredze. A model for mining public health topics from Twitter. Health, 2012. 1Google Scholar
- B. Robinson, R. Power, and M. Cameron. An evidence based earthquake detector using Twitter. LPCI, 2013. 3Google Scholar
- M. A. Russell. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O'Reilly Media, Inc., 2013. 1 Google ScholarDigital Library
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. WWW, 2010. 3 Google ScholarDigital Library
- J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. Twitterstand: news in tweets. SIGSPATIAL GIS, 2009. 3 Google ScholarDigital Library
- D. Scanfeld, V. Scanfeld, and E. L. Larson. Dissemination of health information through social networks: Twitter and antibiotics. American Journal of Infection Control, 2010. 3Google Scholar
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Sharing clusters among related groups: hierarchical Dirichlet processes. NIPS, 2004. 5Google ScholarDigital Library
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006. 5Google Scholar
- D. Trembath, S. Balandin, and C. Rossi. Crosscultural practice and autism. Journal of Intellectual and Developmental Disability, 2005. 2Google Scholar
- Twitter. About. https://about.twitter.com/company. (accessed April 2015). 1Google Scholar
- S. Verma, S. Vieweg, W. J. Corvey, L. Palen, J. H. Martin, M. Palmer, A. Schram, and K. M. Anderson. Natural language processing to the rescue? Extracting "situational awareness" tweets during mass emergency. ICWSM, 2011. 3Google Scholar
- Z. Warren, M. L. McPheeters, N. Sathe, J. H. Foss-Feig, A. Glasser, and J. Veenstra-VanderWeele. A systematic review of early intensive intervention for autism spectrum disorders. Pediatrics, 2011. 2Google Scholar
- Overcoming Data Scarcity of Twitter: Using Tweets as Bootstrap with Application to Autism-Related Topic Content Analysis
Recommendations
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media AnalyticsTwitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technologyMicroblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Sentiment Analysis on Twitter Data: A Survey
ICCCM '19: Proceedings of the 7th International Conference on Computer and Communications ManagementTwitter is the popular micro blogging site where thousands of people exchange their thoughts daily in the form of tweets. The characteristics of tweet is to be short and simple way of expressions. Though this paper will focus on sentiment analysis of ...
Comments