Datenschutzerklärung|Data Privacy
Impressum

28.05.2013
A. Borusan

13.06.2013, 14 Uhr c.t. TU Berlin/DIMA, Thursday, June 13, 2013, 2:00 p.m. Location: TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "Finiding and extracting structure in large datasets" (Prof. Periklis Andritsos, U

Data design has been characterized as a process of arriving at a design that maximizes the
information content of each piece of data (or equivalently, one that minimizes redundancy).
Information content (or redundancy) is measured with respect to a prescribed model for the
data, a model that is often expressed as a set of constraints. In this talk, I consider
the problem of doing data redesign in an environment where the prescribed model is unknown
or incomplete or is the result of integrated information. Specifically, I consider the problem
of finding structural clues in a relational instance of data, missing values, and duplicate records.
We propose a set of clustering-based information-theoretic tools for finding structural summaries
that are useful in characterizing the information content of the data, and ultimately useful
in the design of new relational storage spaces. We study the use of summaries in one specific
physical design task. I also show how these information-theoretic tools can assist in information
extraction tasks and the building of attribute dictionaries in unstructured repositories of
product data.