Datenschutzerklärung|Data Privacy

K. Forster

Paper accepted for publication in this year's CIKM conference.

Title: "Cutty: Aggregate Sharing for User-defined Windows"
Authors: Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl.

To appear in the proceedings of the 25th ACM International Conference on Information and Knowledge Management ( CIKM 2016).

Abstract : Aggregation queries on data streams are evaluated over evolving and often overlapping finite logical views called windows. To enable window discretization, stream processing systems either provide a set of strictly defined windowing primitives (e.g., time windows and sessions), or force programmers to hard-code window definitions as user-defined operators. Query evaluation over sliding windows relies heavily on aggregate sharing techniques in order to reduce redundancy. Existing sharing techniques tackle redundancy by either targeting a very limited class of periodic windows or overgeneralizing at a high cost. As a result, the aggregation of common families of windows such as sessions and punctuations unnecessarily falls back to very expensive best-effort aggregation techniques.

In this work, we enable the efficient sharing of partial aggregates across a very broad range of windows, specified as user-defined functions (UDFs). To this end, we first introduce the concept of User-Defined Windows (UDWs), a UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a novel aggregate sharing technique that operates on UDWs. Cutty subsumes and outperforms the state of the art for single and multiple queries. We implemented our techniques on Apache Flink, an open source stream processing platform, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.