Datenschutzerklärung|Data Privacy

K. Forster

"Fault-Tolerance for Distributed Iterative Dataflows in Action" Paper is published in PVLDB

Fault-Tolerance for Distributed Iterative Dataflows in Action, Chen Xu, Rudi Poepsel Lemaitre, Juan Soto, and Volker Markl .2018. Proceedings of the VLDB Endowment (PVLDB), vol. 11, no. 12., pp. 1990-1993, 2018. DOI:

Abstract :
Distributed data flow systems (DDS) are widely employed in graph processing and machine learning (ML), where many of these algorithms are iterative in nature. Typically, DDS achieve fault-tolerance using checkpointing mechanisms or they exploit algorithmic properties to enable fault-tolerance without the need for checkpoints. Recently, for graph processing, we proposed utilizing unblocking checkpointing , to parallelize the execution pipeline and checkpoint writing, as well as confined recovery , to enable fast recovery upon partial node failures. Furthermore, for ML algorithms implemented using broadcast variables, we proposed utilizing replica recovery , to leverage broadcast variable replicas and facilitate failure recovery checkpointing-free. In this demonstration, we showcase these fault-tolerance techniques using Apache Flink. Attendees will be able to: (i) run representative iterative algorithms including PageRank, Connected Components, and K-Means, (ii) explore the internal behavior of DDS under the influence of unblocking checkpointing, and (iii) trigger failures, to observe the effects of confined recovery and replica recovery.

Paper Download (PDF)