K. Forster

Paper “Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing” accepted for publishing at EDBT 2018

This paper examines the ADWIN algorithm and discusses optimizations to provide scalable concept drift detection for high-velocity data streams.

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing, Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl . 21st International Conference on Extending Database Technology (EDBT), Vienna, March 26-29, 2018.

Abstract :

Machine Learning (ML) techniques for data stream analysis suffer from concept drifts such as changed user references, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (ADWIN) allow for adapting to concept drifts on the fly.

In this paper, we examine in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations increase the throughput of the original ADWIN approach by two orders of magnitude. Thus, we explore parallel daptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.

Link to publication preprint