Abstract
Keyword search enables inexperienced users to easily search XML database with no specific knowledge of complex structured query languages and XML data schemas. Existing work has addressed the problem of selecting data nodes that match keywords and connecting them in a meaningful way, e.g., SLCA and ELCA. However, it is time-consuming and unnecessary to serve all the connected subtrees to the users because in general the users are only interested in part of the relevant results. In this paper, we propose a new keyword search approach which basically utilizes the statistics of underlying XML data to decide the promising result types and then quickly retrieves the corresponding results with the help of selected promising result types. To guarantee the quality of the selected promising result types, we measure the correlations between result types and a keyword query by analyzing the distribution of relevant keywords and their structures within the XML data to be searched. In addition, relevant result types can be efficiently computed without keyword query evaluation and any schema information. To directly return top-k keyword search results that conform to the suggested promising result types, we design two new algorithms to adapt to the structural sensitivity of the keyword nodes over the keyword search results. Lastly, we implement all proposed approaches and present the relevant experimental results to show the effectiveness of our approach.
Similar content being viewed by others
References
Arampatzis, A.T., Kamps, J.: A study of query length. In: SIGIR, pp. 811–812 (2008)
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB, pp. 45–56 (2003)
Denoyer, L., Gallinari, P.: The wikipedia xml corpus. SIGIR Forum 40(1), 64–69 (2006)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, pp. 102–113 (2001)
Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into XML query processing. Comput. Networks 33(1–6), 119–135 (2000)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD Conference, pp. 16–27 (2003)
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: ICDE, pp. 267–276 (2008)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)
iProspect: Iprospect natural seo keyword length study (Nuvember 2004). Technical report, iProspect (2004)
Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Précis: the essence of a query answer. In: ICDE, pp. 69–78 (2006)
Kulkarni, S., Caragea, D.: Computation of the semantic relatedness between words using concept clouds. In: KDIR, pp. 183–188 (2009)
Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: VLDB, pp. 72–83 (2004)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)
Li, J., Liu, C., Zhou, R.,Wang, W.: Suggestion of promising result types forXMLkeyword search. In: EDBT, pp. 561–572 (2010)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic XML data. In: ICDE, pp. 673–684 (2011)
Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)
Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation. PVLDB 2(1), 313–324 (2009)
Liu, C., Li, J., Yu, J.X., Zhou, R.: Adaptive relaxation for querying heterogeneous XML data sources. Inf. Syst. 35(6), 688–707 (2010)
Polyzotis, N., Garofalakis, M.N.: Structure and value synopses for xml data graphs. In: VLDB, pp. 466–477 (2002)
Polyzotis, N., Garofalakis, M.N., Ioannidis, Y.E.: Selectivity estimation for xml twigs. In: ICDE, pp. 264–275 (2004)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)
Termehchy, A., Winslett, M.: Effective, design-independent xml keyword search. In: CIKM, pp. 107–116, (2009)
Termehchy, A., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. 36(1), 4 (2011)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: SIGMOD Conference, pp. 537–538 (2005)
Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: EDBT, pp. 535–546 (2008)
Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: EDBT, pp. 549–560 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Liu, C., Zhou, R. et al. XML keyword search with promising result type recommendations. World Wide Web 17, 127–159 (2014). https://doi.org/10.1007/s11280-012-0198-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-012-0198-9