A. Borusan

11.04.2018, 15:30s.t. TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "End-to-End Entity Resolution for Structured and Semi-Structured Data" (Prof. Themis Palpanas, Senior Member of the French University Institute (

Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on both its effectiveness and time efficiency. Initially, most relevant works were crafted for structured (relational) data that are described by a schema of well-known quality and meaning. With the advent of Big Data, though, these early schema-based approaches became inapplicable, as the scope of ER moved to semi-structured data collections, which abound in noisy, semi-structured, voluminous and highly heterogeneous information.
In this talk, we take a close look on the entire ER workflow (from schema matching to entity clustering), covering both the schema-based and schema-agnostic cases. We will highlight recent works that significantly boost the efficiency of the overall workflow, especially meta-blocking, which cuts down on the computational cost by discarding comparisons that are repeated or lack sufficient evidence for producing duplicates. We will conclude with a brief demonstration of JedAI, our open-source reference toolbox for ER, which incorporates most of the state of the art techniques in the area.