Skip to main content
Log in

Finding smallest k-Compact tree set for keyword queries on graphs using mapreduce

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Keyword search is integrated in many applications on account of the convenience to convey users’ query intention. Most existing works in keyword search on graphs modeled the query results as individual minimal connected trees or connected graphs that contain the keywords. We observe that significant overlap may exist among those query results, which would affect the result diversification. Besides, most solutions required accessing graph data and pre-built indexes in memory, which is not suitable to process big dataset. In this paper, we define the smallest k-compact tree set as the keyword query result, where no shared graph node exists between any two compact trees. We then develop a progressive A* based scalable solution using MapReduce to compute the smallest k-compact tree set, where the computation process could be stopped once the generated compact tree set is sufficient to compute the keyword query result. We conduct experiments to show the efficiency of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp 431–440 (2002)

  2. Dalvi, B.B., Kshirsagar, M., Sudarshan, S.: Keyword search on external memory data graphs. PVLDB 1(1), 1189–1204 (2008)

    Google Scholar 

  3. Ding, B., Jeffrey X.Y., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp 836–845 (2007)

  4. Elbassuoni, S., Blanco, R.: Keyword search over RDF graphs. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, 2011, pp 237–242 (2011)

  5. Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: Wang, J. T.-L. (ed.) SIGMOD Conference ACM, pp 927–940 (2008)

  6. He, H., Wang, H., Yang, J., Philip, S.Y.: Blinks: ranked keyword searches on graphs. In: SIGMOD Conference, pp 305–316 (2007)

  7. Jeffrey X.Y., Qin, L., Chang, L.: Keyword search in relational databases A survey. IEEE Data Eng. Bull. 33(1), 67–78 (2010)

    Google Scholar 

  8. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp 505–516 (2005)

  9. Kargar, M., An. A.: Keyword search in graphs finding r-cliques. PVLDB 4(10), 681–692 (2011)

    Google Scholar 

  10. Ley, M.: The dblp computer science bibliography: Evolution, research issues, perspectives. In: SPIRE, pp 1–10 (2002)

  11. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp 903–914 (2008)

  12. Li, J., Liu, C., Islam, Md.S.: Keyword-based correlated network computation over large social media. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp 268–279 (2014)

  13. Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp 673–684 (2011)

  14. Li, J., Liu, C., Zhou, R., Jeffrey X.Y.: Quasi-slca based keyword queryprocessing over probabilistic XML data. IEEE Trans. Knowl. Data Eng. 26(4), 957–969 (2014)

    Article  Google Scholar 

  15. Li, J., Liu, C., Jeffrey X.Y.: Context-based diversification for keyword queries over XML data. IEEE Trans. Knowl. Data Eng. 27(3), 660–672 (2015)

    Article  Google Scholar 

  16. Moussa, R.: Tpc-h benchmark analytics scenarios and performances on hadoop data clouds. In: NDT (1), pp 220–234 (2012)

  17. Qin, L., Jeffrey X.Y., Chang, L., Tao, Y.: Querying communities in relational databases. In: ICDE, pp 724–735 (2009)

  18. Ye, Y., Wang, G., Chen, L., Wang, H., Efficient keyword search on uncertain graph data. IEEE Trans. Knowl. Data Eng. 25(12), 2767–2779 (2013)

    Article  Google Scholar 

  19. Zhou, R., Liu, C., Li, J., Jeffrey X.Y.: ELCA evaluation for keyword search on probabilistic XML data. World Wide Web 16(2), 171–193 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by ARC DP120102627, ARC DP140103499 and NSFC 61170007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengfei Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Yao, L., Li, J. et al. Finding smallest k-Compact tree set for keyword queries on graphs using mapreduce. World Wide Web 19, 499–518 (2016). https://doi.org/10.1007/s11280-015-0337-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0337-1

Keywords

Navigation