Datenschutzerklärung|Data Privacy
Impressum

23.07.2018
K. Forster

"Benchmarking Distributed Data Processing Systems for Machine Learning Workloads" paper accepted at TPCTC @ VLDB 2018

"Benchmarking Distributed Data Processing Systems for Machine Learning Workloads" authored by Christoph Boden, Tilmann Rabl, Sebastian Schelter and Volker Markl . 2018. Tenth TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC 2018) which takes place at the VLDB Conference 2018.

Abstract
Distributed data processing systems have been widely adopted to robustly scale out computations on massive data sets to many compute nodes in recent years. These systems are also popular choices to scale out the execution of machine learning algorithms. However, it remains an open question how efficiently they actually perform at this task and how to adequately evaluate and benchmark these systems for scalable machine learning workloads in general. For example, the learning algorithms chosen in the corresponding systems papers tend to be those that fit well onto the system's paradigm rather than state of the art methods and the experiments often neglect important aspects such as addressing all aspects of scalability. In this this paper, we present the requirements and all crucial building blocks of a benchmark of distributed data processing system for scalable machine learning workloads. We outline a set of workloads, experiments and metrics that adequately and objectively assess how well data processing systems achieve the objective to scale out machine learning algorithms.

Link to the paper (preprint)