Skip to main content
Log in

Distributed Pregel-based provenance-aware regular path query processing on RDF knowledge graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the proliferation of knowledge graphs, massive RDF graphs have been published on the Web. As an essential type of queries for RDF graphs, Regular Path Queries (RPQs) have been attracting increasing research efforts. However, the existing query processing approaches mainly focus on RPQs under the standard semantics, which cannot provide the provenance of the answer sets. We propose a distributed Pregel-based approach DP2RPQ to evaluating provenance-aware RPQs over big RDF graphs. Our method employs Glushkov automata to keep track of matching processes of RPQs in parallel. Meanwhile, three optimization strategies are devised according to the cost model, including vertex-computation optimization, message-communication reduction, and counting-paths alleviation, which can reduce the intermediate results of the basic DP2RPQ algorithm dramatically and overcome the counting-paths problem to some extent. The proposed algorithms are verified by extensive experiments on both synthetic and real-world datasets, which show that our approach can efficiently answer the provenance-aware RPQs over large RDF graphs. Furthermore, the RPQ semantics of DP2RPQ is richer than that of RDFPath, and the performance of DP2RPQ is still far better than that of RDFPath.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21

Similar content being viewed by others

Notes

  1. https://neo4j.com/

  2. http://research.microsoft.com/en-us/projects/trinity/

  3. https://cloud.tencent.com/

  4. http://swat.cse.lehigh.edu/projects/lubm/

  5. http://dsg.uwaterloo.ca/watdiv/

  6. http://wiki.dbpedia.org/

  7. http://dsg.uwaterloo.ca/watdiv/watdiv-data-model.txt

References

  1. Arenas, M, Conca, S, Pérez, J.: Counting beyond a yottabyte, or how sparql 1.1 property paths will prevent adoption of the standard. In: Proceedings of the 21st International Conference on World Wide Web, pp 629–638. ACM (2012)

  2. Avery, C: Giraph: Large-scale graph processing infrastructure on hadoop. Proc. Hadoop Summit Santa Clara 11(3), 5–9 (2011)

    Google Scholar 

  3. Bai, Y., Wang, C., Ning, Y., Wu, H., Wang, H.: G-path: Flexible path pattern query on large graphs. In: Proceedings of the 22nd International Conference on World Wide Web, pp 333–336. ACM (2013)

  4. Bai, Y., Wang, C., Ying, X., Wang, M., Gong, Y.: Path pattern query processing on large graphs. In: IEEE Fourth International Conference on Big Data & Cloud Computing (2014)

  5. Bai, Y., Wang, C., Ying, X.: Para-G: Path pattern query processing on large graphs. World Wide Web 20(3), 515–541 (2017)

    Article  Google Scholar 

  6. Barceló, P., Libkin, L, Lin, AW, Wood, PT: Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. (TODS) 37(4), 31 (2012)

    Article  Google Scholar 

  7. Brüggemann-Klein, A.: Regular expressions into finite automata. Theor. Comput. Sci. 120(2), 197–213 (1993)

    Article  MathSciNet  Google Scholar 

  8. Brzozowski, JA: Derivatives of regular expressions. J. ACM (JACM) 11(4), 481–494 (1964)

    Article  MathSciNet  Google Scholar 

  9. Calvanese, D, De Giacomo, G, Lenzerini, M, Vardi, MY: Answering regular path queries using views. In: 16th International Conference on Data Engineering, 2000. Proceedings, pp 389–398. IEEE (2000)

  10. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Dey, S, Cuevas-Vicenttín, V., Köhler, S., Gribkoff, E., Wang, M., Ludäscher, B.: On implementing provenance-aware regular path queries with relational query engines. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp 214–223. ACM (2013)

  12. Gerbessiotis, A.V., Valiant, L.G.: Direct bulk-synchronous parallel algorithms. J. Parallel Distrib. Comput. 22(2), 251–267 (1994)

    Article  Google Scholar 

  13. Harris, S, Seaborne, A, Prud’hommeaux, E: Sparql 1.1 query language. W3C Recommend., 21(10) (2013)

  14. Jupp, S, Malone, J, Bolleman, J, Brandizi, M, Davies, M, Garcia, L, Gaulton, A, Gehant, S, Laibe, C, Redaschi, N, et al.: The ebi rdf platform: linked open data for the life sciences. Bioinformatics 30(9), 1338–1339 (2014)

    Article  Google Scholar 

  15. Koschmieder, A, Leser, U: Regular path queries on large graphs. In: International Conference on Scientific and Statistical Database Management, pp 177–194. Springer (2012)

  16. Kostylev, EV, Reutter, JL, Romero, M, Vrgoč, D.: Sparql with property paths. In: International Semantic Web Conference, pp 3–18. Springer (2015)

  17. Lehmann, J, Isele, R, Jakob, M, Jentzsch, A, Kontokostas, D, Mendes, PN, Hellmann, S, Morsey, M, Van Kleef, P, Auer, S, et al.: Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2015)

    Article  Google Scholar 

  18. Libkin, L., Martens, W., Vrgoč, D.: Querying graph databases with XPath. In: Proceedings of the 16th International Conference on Database Theory, pp 129–140. ACM (2013)

  19. Malewicz, G, Austern, MH, Bik, AJ, Dehnert, JC, Horn, I, Leiser, N, Czajkowski, G: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 135–146. ACM (2010)

  20. Nolé, M., Sartiani, C: Regular path queries on massive graphs. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, p 13. ACM (2016)

  21. Nolé, M., Sartiani, C.: A distributed implementation of GXPath. In: EDBT/ICDT Workshops (2016)

  22. Przyjaciel-Zablocki, M, Schätzle, A., Hornung, T, Lausen, G: Rdfpath: Path query processing on large rdf graphs with mapreduce. In: Extended Semantic Web Conference, pp 50–64. Springer (2011)

  23. Tong, Y, She, J, Meng, R: Bottleneck-aware arrangement over event-based social networks: The max-min approach. World Wide Web 19(6), 1151–1177 (2016)

    Article  Google Scholar 

  24. Wang, X, Ling, J, Wang, J, Wang, K, Feng, Z: Answering provenance-aware regular path queries on rdf graphs using an automata-based algorithm. In: Proceedings of the 23rd International Conference on World Wide Web, pp 395–396. ACM (2014)

  25. Wang, X, Wang, J: Provrpq: An interactive tool for provenance-aware regular path queries on rdf graphs. In: Australasian Database Conference, pp 480–484. Springer (2016)

  26. Wang, X, Wang, J, Zhang, X: Efficient distributed regular path queries on rdf graphs using partial evaluation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp 1933–1936. ACM (2016)

  27. Wang, M., Zhang, J., Liu, J., Hu, W., Wang, S., Li, X., Liu, W.: Pdd graph: Bridging electronic medical records and biomedical knowledge graphs via entity linking. In: International Semantic Web Conference, pp 219–227. Springer (2017)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61572353), the Natural Science Foundation of Tianjin (17JCYBJC15400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, S., Xin, Y. et al. Distributed Pregel-based provenance-aware regular path query processing on RDF knowledge graphs. World Wide Web 23, 1465–1496 (2020). https://doi.org/10.1007/s11280-019-00739-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00739-0

Keywords

Navigation