Datenschutzerklärung|Data Privacy

K. Dießel

Paper "Self-Supervised Web Search for Any-k Complete Tuples" presented at BeWeb Workshop (EDBT 2011)

Auf dem BeWeb Workshop der EDBT 2011 {1,2} wurde folgendes Paper {3}

Self-Supervised Web Search for Any-k Complete Tuples

A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The ‘any-k’ phrase denotes that retrieved tuples are not ranked by the system.

For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples.

Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism 14 for our ‘querying the Web’ scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process.

We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.