Datenschutzerklärung|Data Privacy
Impressum

18.04.2016
A. Borusan

27.04.2016, 11:15 Uhr TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "Explaining the outputs of modern data analytics" (Frank McSherry)

We have made substantial progress with modern data analytics, moving well beyond the realm of simply counting words. We can determine interesting graph properties---connectivity, reachability, matchings---and maintain these properties in real time. We can produce a tremendous amount of output, but it isn't clear that we understand it all yet.

In this talk, I'll explain a framework for interactively determining and tracking *explanations* for outputs of arbitrary differential dataflow computations: subsets of the actual input which reproduce the outputs. In the relational setting, this would be "provenance" or "lineage", but in the big data space, including iteration and non-monotonic reducers, existing techniques do not work: they return either (i) too much input data or (ii) insufficient input data to reproduce the output. We'll fix all of that.

This talk reflects joint work with Zaheer Chothia, John Liagouris, and Mothy Roscoe in the Systems Group in ETH Zurich.