Datenschutzerklärung|Data Privacy

Juan Soto

19.10.2014: "Efficient Sample Generation for Scalable Meta Learning" Paper Accepted @ ICDE 2015

We are happy to announce that a paper called "Efficient Sample Generation for Scalable Meta Learning" by Sebastian Schelter, Douglas Burdick, Berthold Reinwald, Alexandre Evfimievski, Juan Soto and Volker Markl has been accepted for publication at ICDE 2015 in Seoul, Korea . The paper has been written in collaboration with researchers from IBM's Almaden Research Center in San Jose, California.

Meta learning techniques such as cross-validation and ensemble learning are crucial for applying machine learning to real-world use cases. These techniques first generate samples from input data, and then train and evaluate machine learning models on these samples. For meta learning on large datasets, the efficient generation of samples becomes problematic, especially when the data is stored distributed in a block-partitioned representation, and processed on a shared-nothing cluster. We present a novel, parallel algorithm for efficient sample generation from large, block-partitioned datasets in a shared-nothing architecture. This algorithm executes in a single pass over the data, and minimizes inter-machine communication. The algorithm supports a wide variety of sample generation techniques through an embedded user-defined sampling function. We illustrate how to implement distributed sample generation for popular meta learning techniques such as hold-out tests, k-fold cross-validation, and bagging, using our algorithm and present an experimental evaluation on datasets with billions of datapoints.