Datenschutzerklärung|Data Privacy

Martin Pagel

The Paper "Scotty: General and Efficient Open-Source Window Aggregation for Stream Processing Systems" was Accepted for Publication in ACM Transactions on Database Systems

"Scotty: General and Efficient Open-Source Window Aggregation for Stream Processing Systems". Jonas Traub, Philipp Marian Grulich, Alejandro Rodríguez Cuéllar, Sebastian Bress, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. To be Published in ACM Transactions on Database Systems (ACM TODS).

This extended journal paper is a major extension of the EDBT best paper titled "Efficient Window Aggregation with General Stream Slicing" from 2019 by the same authors. Among other extensions, the new journal paper was extended with detailed algorithm specifications, API-examples, and examples for using Scotty in different streaming systems.

Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or count-based), and stream (dis)order. In this paper, we present Scotty, an efficient and general open-source operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions by up to one order of magnitude.

A preprint version of the paper is available here.