29.01.2018
K. Forster

Paper “Scotty: Efficient Window Aggregation for out-of-order Stream Processing” accepted for publishing at ICDE 2018

The paper presents a high throughput operator for window discretization and aggregation.

Scotty: Efficient Window Aggregation for out-of-order Stream Processing , Jonas Traub, Philipp Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. ICDE 2018, Paris, April 16th – 20th.

Abstract :
Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures. Session windows are widely used to separate different periods of user activity from each other.
In this paper, we present Scotty, a high throughput operator for window discretization and aggregation. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all concurrent queries with arbitrary combinations of tumbling, sliding, and session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Our technique is generally applicable to a broad group of dataflow systems which use a unified batch and stream processing model. Our experiments show that we achieve an order of magnitude higher throughput than alternative state-of-the-art solutions.

Corntributions :
1) we enable stream slicing for session windows in addition to tumbling and sliding windows.
2) we introduce a Slice Manager which retains the minimum number of slices for tumbling, sliding, and session windows when processing tuples out-of-order.
3) we optimize stream slicing for large numbers of concurrent queries with possibly different window types. Thereby, we share partial aggregates among all queries.
4) We experimentally show that Scotty achieves the highest throughput, the smallest memory footprint, and competitive latency compared to state-of-the art techniques.

Link to publication preprint