Datenschutzerklärung|Data Privacy

C. Boden

Short Paper "Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins" accepted at ACM RecSys 2013

The short paper "Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins" by Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov and Volker Markl has been accepted for presentation and publication at the ACM Recommender Systems conference (RecSys 2013) in Hong Kong.


The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce.

We empirically show that the performance of our solution is suitable for real-world use cases. We present experiments on two publicly available datasets and on an artificial dataset termed Bigflix , generated from the Netflix dataset. Bigflix contains 25~million users and more than 5~billion ratings, mimicking data sizes recently reported as Netflix' production workload. We demonstrate that our approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. Our implementation has been contributed to the open source machine learning library Apache Mahout.