Datenschutzerklärung|Data Privacy

Martin Pagel

The Paper "Distributed Graph Analytics with Datalog Queries in Flink" was Accepted for Presentation at LSGDA 2020

"Distributed Graph Analytics with Datalog Queries in Flink". Muhammad Imran, Gábor Gévay, Volker Markl. To be Presented at the 2nd International Workshop on Large Scale Graph Data Analytics (LSGDA 2020) in conjunction with the 2020 VLDB Conference, Tokyo, Japan, September 4, 2020.

Large-scale, parallel graph processing has been in demand over the past decade. Succinct program structure and efficient execution are among the essential requirements of graph processing frameworks. In this paper, we present Cog, which executes Datalog programs on the Apache Flink distributed dataflow system. We chose Datalog for its compact program structure and Flink for its efficiency. We implemented a parallel semi-naive evaluation algorithm exploiting Flink's delta iteration to propagate only the tuples that need to be further processed to the subsequent iterations. Flink's delta iteration feature reduces the overhead present in acyclic dataflow systems, such as Spark, when evaluating recursive queries, hence making it more efficient. We demonstrated in our experiments that Cog outperformed BigDatalog, the state-of-the-art distributed Datalog evaluation system, in most of the tests.

A preprint version is available here.

To learn more about LSGDA 2020, please visit