15.09.2017
Andreas Kunft

Andreas Kunft presented "Efficiently Executing R Dataframes on Flink" at Flink Forward 2017

Andreas Kunft presented the talk "Efficiently Executing R Dataframes on Flink" at this year's Flink Forward in Berlin.

Link:
https://berlin.flink-forward.org/kb_sessions/efficiently-executing-r-dataframes-on-flink

Abstract:
While dataflow engines offer scalability, their programming abstractions are often unfamiliar to data scientists, which are used to Python and R. To provide a more convenient interface, dataflow engines like Spark provide an R-like dataframe abstraction. While operations without user-defined code can be executed efficiently, the execution of UDFs is dominated by serialized data exchange between the dataflow engine and an external R process that evaluates the code. We present a new approach to execute user-defined functions by using the Truffle/Graal compiler infrastructure, which enables efficient execution of dynamic languages on the JVM. Based on fastR, the R language provided by this infrastructure, we exemplify the execution of R scripts directly inside the data pipelines of Flink, without data serialization and inter-process communication. Furthermore, we discuss future opportunities and problems, and compare our approach to native Flink, Spark, and SparkR.