Datenschutzerklärung|Data Privacy

Martin Pagel

The Paper "Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing" was Accepted for Publication at BTW 2021

"Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing". Alexander Kumaigorodski, Clemens Lutz, Volker Markl. To be Presented at 19. Fachtagung für Datenbanksysteme für Business, Technologie und Web (BTW 2021), September 20 - 24, 2021.

Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input.
However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs.
In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.

A preprint version of the paper is available here.

To learn more about BTW 2021, please visit