Datenschutzerklärung|Data Privacy

Juan Soto

17.07.2014: "Composite Key Generation on a Shared Nothing Architecture‚ÄĚ Paper Accepted at TPCTC 2014

The paper will be presented at the upcoming 6th Technology Conference on Performance Evaluation and Benchmarking TPCTC 2014 (Collocated with VLDB 2014, Hangzhou, PRC) and the conference proceedings will be published by Springer-Verlag, as part of the Lecture Notes in Computer Science (LNCS) series.

Authored by : Marie Hoffmann (TUB), Alexander Alexandrov (TUB), Periklis Andritsos (University of Lausanne), Juan Soto (TUB), and Volker Markl (TUB)

Abstract . Generating synthetic data sets is integral to benchmarking, debugging, and simulating future scenarios. As data sets become larger, real data characteristics thereby become necessary for the success of new algorithms. Recently introduced software systems allow for synthetic data generation that is truly parallel. The systems use fast pseudorandom number generators and can handle complex schemas and uniqueness constraints on single attributes. Uniqueness is essential for forming keys, which identify single entries in a database instance. The uniqueness property is usually guaranteed by sampling from a uniform distribution and adjusting the sample size to the output size of the table such that there are no collisions. However, when it comes to real composite keys, where only the combination of the key attribute has the uniqueness property, a different strategy needs is required. In this paper, we present a novel approach on how to generate composite keys within a parallel data generation framework. We compute a joint probability distribution that incorporates the distributions of the key attributes and use the unique sequence positions of entries to address distinct values in the key domain.