Datenschutzerklärung|Data Privacy

Chen Xu

21.12.2015: Paper "Efficient Fault-tolerance for Iterative Graph Processing on Distributed Dataflow Systems" accepted by ICDE 2016

We are happy to announce that a paper called " Efficient Fault-tolerance for Iterative Graph Processing on Distributed Dataflow Systems " by Chen Xu(TU Berlin), Markus Holzemer (TU Berlin), Manohar Kaul (IIT Hyderabad), Volker Markl (TU Berlin) has been accepted for publication at ICDE 2016 in Helsinki, Finland.

Abstract: Real-world graph processing applications in many instances require combining the graph data with tabular data or making the graph processing a part of a larger analytics pipeline. General-purpose distributed dataflow frameworks typically execute such pipelines while analyzing the entire pipeline in a holistic manner to further optimize the processing. A majority of big graph processing algorithms are iterative in nature and incur a long runtime, therefore it becomes all the more necessary to tolerate and recover quickly from any intermittent failures. In this work, we propose an efficient fault-tolerance mechanism for iterative graph processing on distributed data-flow systems with the objective to reduce the checkpointing cost and failure recovery time. Rather than writing checkpoints that block downstream operators, we write checkpoints in an unblocking manner. Also, in comparison to the typical unblocking checkpointing approach of managing checkpoints independently, we inject the checkpoints into the dataflow itself. It not only inherits the advantage of a low execution latency without breaking the pipelined tasks, but simplifies the system design to coordinate the checkpoint writing. Further, we achieve speedier recovery, i.e., confined recovery, by using the local log files on each node to avoid a complete re-computation from scratch. Our theoretical studies as well as experimental analysis on Flink give further insight into our fault-tolerance strategies and show that they are more efficient than blocking checkpointing and complete recovery for iterative graph processing on dataflow systems.