Datenschutzerklärung|Data Privacy

Juan Soto

24.03.2015: 3 Papers Accepted @ 2015 ACM SIGMOD in Melbourne, Australia

The annual ACM SIGMOD/PODS conference (A*) is a leading international forum for database researchers. Three papers, 1 Demo Paper and 2 Full Research Papers, are accepted for presentation at the this year's conference, May 31 - June 4, 2015:

Our demo paper
Optimistic Recovery for Iterative Dataflows in Action , authored by Sergey Dudoladov, Asterios Katsifodimos, Chen Xu, Stephan Ewen(1), Volker Markl, Sebastian Schelter, Kostas Tzoumas(1) ,
TU Berlin, DIMA, (1) Data Artisans GmbH.
Over the past years, parallel dataflow systems have been employed for advanced analytics in the field of data mining where many algorithms are iterative.
These systems typically provide fault tolerance by periodically checkpointing the algorithm’s state and, in case of failure, restoring a consistent state from a checkpoint.
In prior work, we presented an optimistic recovery mechanism that in certain cases eliminates the need to checkpoint the intermediate state of an iterative algorithm. In case of failure, our mechanism uses a compensation function to transit the algorithm to a consistent state, from which the execution can continue and successfully converge. Since this recovery mechanism does not checkpoint any state, it achieves optimal failure-free performance while guaranteeing fault tolerance.
In this paper, we demonstrate our recovery mechanism with the Apache Flink data processing engine. During our demonstration, attendees will be able to run graph algorithms and trigger failures to observe the algorithms recovering with compensation functions instead of checkpoints.

Our full research papers
Implicit Parallelism through Deep Language Embedding , authored by Alexander Alexandrov, Felix Schüler, Tobias Herb, Andreas Kunft, Lauritz Thamsen, Asterios Katsifodimos, Odej Kao, Volker Markl ,
TU Berlin, DIMA und CIT.
The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based data ow APIs that are tightly coupled to their underlying runtime engine. Expressing data analysis algorithms with complex data and control flow structure using such APIs reveals a number of limitations that impede programmer's productivity.
In this paper we show that the design of data analysis languages and APIs from a runtime engine point of view bloats the APIs with low-level primitives and affects programmer's productivity. Instead, we argue that an approach based on deeply embedding the APIs in a host language can address the shortcomings of current data analysis languages. To demonstrate this, we propose a language for complex data analysis embedded in Scala, which (i) allows for declarative specification of datafows and (ii) hides the notion of dataparallelism and distributed runtime behind a suitable intermediate representation. We describe a compiler pipeline that facilitates efficient data-parallel processing without imposing runtime engine-bound syntactic or semantic restrictions on the structure of the input programs. We present a series of experiments with two state-of-the-art systems that demonstrate the optimization potential of our approach.

Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation , authored by Max Heimel, Martin Kiefer, Volker Markl , TU Berlin, DIMA.
Quickly and accurately estimating the selectivity of multidimensional predicates is a vital part of a modern relational query optimizer. The state-of-the art in this field are multidimensional histograms, which offer good estimation quality but are complex to construct and hard to maintain. Kernel Density Estimation (KDE) is an interesting alternative that does not suffer from these problems. However, existing KDE-based selectivity estimators can hardly compete with the estimation quality of state-of-the art methods.
In this paper, we substantially expand the state-of-the-art in KDE-based selectivity estimation by improving along three dimensions: First, we demonstrate how to numerically optimize a KDE model, leading to substantially improved estimates. Second, we develop methods to continuously adapt the estimator to changes in both the database and the query workload. Finally, we show how to drastically improve the performance by pushing computations onto a GPU.
We provide an implementation of our estimator and experimentally evaluate it on a variety of datasets and workloads, demonstrating that it efficiently scales up to very large model sizes, adapts itself to database changes, and typically outperforms the estimation quality of both existing Kernel Density Estimators as well as state-of-the-art multidimensional histograms.