Skip to main content
Log in

Detecting topic-level influencers in large-scale scientific networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Scientific networks play an increasingly important role in facilitating knowledge and technique diffusion. In such networks, highly influential nodes (scientists or literatures) are prone to stimulate other researchers in the generation of innovative ideas. The objective of this study is to detect topic-level influencers from a large collection of links between nodes and textual contents in scientific networks. For this purpose, we propose a sparse link topic model (SLTM) that introduces a “Spike and Slab” prior to achieve sparsity in node-topic distribution. Compared with previous approaches, our model assumes that a node usually focuses on several salient topics instead of a wide range of topics, which is useful in learning topic-level influencers in scientific networks. In addition, a collapsed variational Bayesian (CVB) inference algorithm is designed for large-scale applications. Our experiments are conducted on a large scientific collaboration network. The results reveal that the proposed model significantly improves the precision of topic-level detection. Our analysis also reflects that SLTM can explicitly model the sparse topical structure of each node in the network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/soberqian/SparseLinkTModel

  2. https://dl.acm.org/

  3. https://www.aminer.cn/citation

  4. In this study, we set ϵ = 0.01.

References

  1. Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 27–34. AUAI Press (2009)

  2. Bi, B., Tian, Y., Sismanis, Y., Balmin, A., Cho, J.: Scalable topic-specific influence analysis on microblogs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 513–522. ACM (2014)

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  4. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  5. Dukas, R.: Causes and consequences of limited attention. Brain Behav. Evol. 63(4), 197–210 (2004)

    Article  Google Scholar 

  6. Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004)

    Article  Google Scholar 

  7. Ganchev, K., Taskar, B., Pereira, F., Gama, J.: Posterior vs parameter sparsity in latent variable models. In: Advances in Neural Information Processing Systems, pp. 664–672 (2009)

    Google Scholar 

  8. Gomez-Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data. 5(4), 21:1–21:37 ACM New York, NY, USA (2012)

  9. Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 730–773 (2005)

    Article  MathSciNet  Google Scholar 

  10. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)

  11. Lakshminarayanan, B., Raich, R.: Inference in supervised latent Dirichlet allocation. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. IEEE (2011)

  12. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429. ACM (2007)

  13. Lian, D., Zheng, K., Ge, Y., Cao, L., Chen, E., Xie, X.: GeoMF++: Scalable location recommendation via joint geographical modeling and matrix factorization. ACM Trans. Inf. Syst. 36(3), 33:1–33:29 (2018)

    Article  Google Scholar 

  14. Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 539–550. ACM (2014)

  15. Ling, G., Lyu, M.R., King, I.: Ratings meet reviews, a combined approach to recommend. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 105–112. ACM (2014)

  16. Liu, G., Wang, Y., Orgun, M.A.: Optimal social trust path selection in complex social networks. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)

    Google Scholar 

  17. Liu, G., Wang, Y., Orgun, M.A., Lim, E.-P.: Finding the optimal social trust path for the selection of trustworthy service providers in complex social networks. IEEE Trans. Serv. Comput. 6(2), 152–167 (2011)

    Article  Google Scholar 

  18. Liu, G., Zheng, K., Wang, Y., Orgun, M.A., Liu, A., Zhao, L., Zhou, X.: Multi-constrained graph pattern matching in large-scale contextual social graphs. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 351–362. IEEE (2015)

  19. Liu, G., Zhu, F., Zheng, K., Liu, A., Li, Z., Zhao, L., Zhou, X.: TOSI: a trust-oriented social influence evaluation method in contextual social networks. Neurocomputing. 210, 130–140 (2016)

    Article  Google Scholar 

  20. Liu, G., Liu, Y., Zheng, K., Liu, A., Li, Z., Wang, Y., Zhou, X.: MCS-GPM: multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)

    Article  Google Scholar 

  21. Liu, Q., Xiang, B., Yuan, N.J., Chen, E., Xiong, H., Zheng, Y., Yang, Y.: An influence propagation view of pagerank. ACM Trans. Knowl. Discov. Data. 11(3), 30 (2017)

    Google Scholar 

  22. Liu, A., Wang, W., Shang, S., Li, Q., Zhang, X.: Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica. 22(2), 335–362 (2018)

    Article  Google Scholar 

  23. Matsubara, Y., Sakurai, Y., Prakash, B.A., Li, L., Faloutsos, C.: Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 6–14. ACM New York, NY, USA (2012)

  24. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, pp. 262–272. Association for Computational Linguistics (2011)

  25. Raftery, A.E., Lewis, S.: How Many Iterations in the Gibbs Sampler? Washington Univ Seattle Dept of Statistics (1991)

  26. Rakesh, V., Ding, W., Ahuja, A., Rao, N., Sun, Y., Reddy, C.K.: A sparse topic model for extracting aspect-specific summaries from online reviews. In: Proceedings of the 2018 World Wide Web Conference, pp. 1573–1582. International World Wide Web Conferences Steering Committee (2018)

  27. Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1–2), 157–208 (2012)

    Article  MathSciNet  Google Scholar 

  28. Sato, I., Nakagawa, H.: Rethinking collapsed variational Bayes inference for LDA. arXiv preprint. arXiv, 1206.6435 (2012)

    Google Scholar 

  29. Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 807–816. ACM (2009)

  30. Teh, Y.W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 1353–1360 (2007)

    Google Scholar 

  31. Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: Advances in Neural Information Processing Systems, pp. 1982–1989, Vancouver (2009)

  32. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1039–1048. ACM New York, NY, USA (2010)

  33. Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted topic modeling for focused analysis. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1244. ACM (2016)

  34. Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)

  35. Weng, L., Flammini, A., Vespignani, A., Menczer, F.: Competition among memes in a world with limited attention. Sci. Rep. 2, 335 (2012)

    Article  Google Scholar 

  36. Yao, L., Zhang, Y., Wei, B., Zhang, W., Jin, Z.: A topic modeling approach for traditional chinese medicine prescriptions. IEEE Trans. Knowl. Data Eng. 30(6), 1007–1021 (2018)

    Article  Google Scholar 

  37. Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014)

  38. Zhao, Y., Zheng, K., Li, Y., Su, H., Liu, J., Zhou, X.: Destination-aware task assignment in spatial crowdsourcing: a worker decomposition approach. IEEE Trans. Knowl. Data Eng. (2019)

  39. Zhou, G., Zhao, J., He, T., Wu, W.: An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities. Knowl.-Based Syst. 66, 136–145 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (No. 2018YFB1402600), the Major Program of the National Natural Science Foundation of China (91846201, 71490725), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (71521001), the National Natural Science Foundation of China (71722010, 91546114, 91746302, 71872060), the National Key Research and Development Program of China (2017YFB0803303), and Zhijiang Lab (NO. 2019KE0AB04). We would like to thank Dr. Junming Yin (University of Arizona) for his suggestion to use the collapsed variational Bayesian (CVB) inference for the proposed model. We also thank Prof. Chunhua Sun for helping proofread our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanchun Jiang.

Additional information

This article belongs to the Topical Collection: Special Issue on Graph Data Management in Online Social Networks

Guest Editors: Kai Zheng, Guanfeng Liu, Mehmet A. Orgun, and Junping Du

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, Y., Liu, Y., Jiang, Y. et al. Detecting topic-level influencers in large-scale scientific networks. World Wide Web 23, 831–851 (2020). https://doi.org/10.1007/s11280-019-00751-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00751-4

Keywords

Navigation