Abstract
Scientific networks play an increasingly important role in facilitating knowledge and technique diffusion. In such networks, highly influential nodes (scientists or literatures) are prone to stimulate other researchers in the generation of innovative ideas. The objective of this study is to detect topic-level influencers from a large collection of links between nodes and textual contents in scientific networks. For this purpose, we propose a sparse link topic model (SLTM) that introduces a “Spike and Slab” prior to achieve sparsity in node-topic distribution. Compared with previous approaches, our model assumes that a node usually focuses on several salient topics instead of a wide range of topics, which is useful in learning topic-level influencers in scientific networks. In addition, a collapsed variational Bayesian (CVB) inference algorithm is designed for large-scale applications. Our experiments are conducted on a large scientific collaboration network. The results reveal that the proposed model significantly improves the precision of topic-level detection. Our analysis also reflects that SLTM can explicitly model the sparse topical structure of each node in the network.
Similar content being viewed by others
Notes
In this study, we set ϵ = 0.01.
References
Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 27–34. AUAI Press (2009)
Bi, B., Tian, Y., Sismanis, Y., Balmin, A., Cho, J.: Scalable topic-specific influence analysis on microblogs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 513–522. ACM (2014)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Dukas, R.: Causes and consequences of limited attention. Brain Behav. Evol. 63(4), 197–210 (2004)
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004)
Ganchev, K., Taskar, B., Pereira, F., Gama, J.: Posterior vs parameter sparsity in latent variable models. In: Advances in Neural Information Processing Systems, pp. 664–672 (2009)
Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data. 5(4), 21:1–21:37 ACM New York, NY, USA (2012)
Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 730–773 (2005)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)
Lakshminarayanan, B., Raich, R.: Inference in supervised latent Dirichlet allocation. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. IEEE (2011)
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429. ACM (2007)
Lian, D., Zheng, K., Ge, Y., Cao, L., Chen, E., Xie, X.: GeoMF++: Scalable location recommendation via joint geographical modeling and matrix factorization. ACM Trans. Inf. Syst. 36(3), 33:1–33:29 (2018)
Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 539–550. ACM (2014)
Ling, G., Lyu, M.R., King, I.: Ratings meet reviews, a combined approach to recommend. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 105–112. ACM (2014)
Liu, G., Wang, Y., Orgun, M.A.: Optimal social trust path selection in complex social networks. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Liu, G., Wang, Y., Orgun, M.A., Lim, E.-P.: Finding the optimal social trust path for the selection of trustworthy service providers in complex social networks. IEEE Trans. Serv. Comput. 6(2), 152–167 (2011)
Liu, G., Zheng, K., Wang, Y., Orgun, M.A., Liu, A., Zhao, L., Zhou, X.: Multi-constrained graph pattern matching in large-scale contextual social graphs. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 351–362. IEEE (2015)
Liu, G., Zhu, F., Zheng, K., Liu, A., Li, Z., Zhao, L., Zhou, X.: TOSI: a trust-oriented social influence evaluation method in contextual social networks. Neurocomputing. 210, 130–140 (2016)
Liu, G., Liu, Y., Zheng, K., Liu, A., Li, Z., Wang, Y., Zhou, X.: MCS-GPM: multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)
Liu, Q., Xiang, B., Yuan, N.J., Chen, E., Xiong, H., Zheng, Y., Yang, Y.: An influence propagation view of pagerank. ACM Trans. Knowl. Discov. Data. 11(3), 30 (2017)
Liu, A., Wang, W., Shang, S., Li, Q., Zhang, X.: Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica. 22(2), 335–362 (2018)
Matsubara, Y., Sakurai, Y., Prakash, B.A., Li, L., Faloutsos, C.: Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 6–14. ACM New York, NY, USA (2012)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, pp. 262–272. Association for Computational Linguistics (2011)
Raftery, A.E., Lewis, S.: How Many Iterations in the Gibbs Sampler? Washington Univ Seattle Dept of Statistics (1991)
Rakesh, V., Ding, W., Ahuja, A., Rao, N., Sun, Y., Reddy, C.K.: A sparse topic model for extracting aspect-specific summaries from online reviews. In: Proceedings of the 2018 World Wide Web Conference, pp. 1573–1582. International World Wide Web Conferences Steering Committee (2018)
Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1–2), 157–208 (2012)
Sato, I., Nakagawa, H.: Rethinking collapsed variational Bayes inference for LDA. arXiv preprint. arXiv, 1206.6435 (2012)
Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 807–816. ACM (2009)
Teh, Y.W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 1353–1360 (2007)
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989, Vancouver (2009)
Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1039–1048. ACM New York, NY, USA (2010)
Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted topic modeling for focused analysis. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1244. ACM (2016)
Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)
Weng, L., Flammini, A., Vespignani, A., Menczer, F.: Competition among memes in a world with limited attention. Sci. Rep. 2, 335 (2012)
Yao, L., Zhang, Y., Wei, B., Zhang, W., Jin, Z.: A topic modeling approach for traditional chinese medicine prescriptions. IEEE Trans. Knowl. Data Eng. 30(6), 1007–1021 (2018)
Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014)
Zhao, Y., Zheng, K., Li, Y., Su, H., Liu, J., Zhou, X.: Destination-aware task assignment in spatial crowdsourcing: a worker decomposition approach. IEEE Trans. Knowl. Data Eng. (2019)
Zhou, G., Zhao, J., He, T., Wu, W.: An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities. Knowl.-Based Syst. 66, 136–145 (2014)
Acknowledgements
This work is supported by the National Key R&D Program of China (No. 2018YFB1402600), the Major Program of the National Natural Science Foundation of China (91846201, 71490725), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (71521001), the National Natural Science Foundation of China (71722010, 91546114, 91746302, 71872060), the National Key Research and Development Program of China (2017YFB0803303), and Zhijiang Lab (NO. 2019KE0AB04). We would like to thank Dr. Junming Yin (University of Arizona) for his suggestion to use the collapsed variational Bayesian (CVB) inference for the proposed model. We also thank Prof. Chunhua Sun for helping proofread our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article belongs to the Topical Collection: Special Issue on Graph Data Management in Online Social Networks
Guest Editors: Kai Zheng, Guanfeng Liu, Mehmet A. Orgun, and Junping Du
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qian, Y., Liu, Y., Jiang, Y. et al. Detecting topic-level influencers in large-scale scientific networks. World Wide Web 23, 831–851 (2020). https://doi.org/10.1007/s11280-019-00751-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00751-4