Datenschutzerklärung|Data Privacy
Impressum

27.01.2021
Martin Pagel

The Paper "Hybrid Evaluation for Distributed Iterative Matrix Computation" was Accepted for Presentation at SIGMOD 2021

"Hybrid Evaluation for Distributed Iterative Matrix Computation." Zihao Chen, Chen Xu, Juan Soto, Volker Markl, Weining Qian, Aoying Zhou. To be Presented at ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD/PODS 2021), June 20-25, 2021, Xi'an, Shaanxi, China.

Abstract:
Distributed matrix computation is common in large-scale data processing and machine learning applications. Many iterative-convergent algorithms involving matrix computation share a common property: parameters converge non-uniformly. This property can be exploited to eliminate computational redundancy via incremental evaluation. Existing systems that support distributed matrix computation already explore incremental evaluation. However, they are oblivious to the fact that non-zero increments are scattered in different blocks in a distributed environment. Additionally, we observe that incremental evaluation does not always outperform full evaluation. To address these issues, we propose matrix reorganization to optimize the physical layout upon the state-of-art optimized partition schemes, and thereby accelerate the incremental evaluation. More importantly, we propose a hybrid evaluation to efficiently interleave full and incremental evaluation during the iterative process. In particular, it employs a cost model to compare the overhead costs of two types of evaluations and a selective comparison mechanism to reduce the overhead incurred by comparison itself. To demonstrate the efficiency of our techniques, we implement HyMAC, a hybrid matrix computation system based on SystemML. Our experiments show that HyMAC reduces execution time on large datasets by 23% on average in comparison to the state-of-art optimization technique and consequently outperforms SystemML, ScaLAPACK, and SciDB by an order of magnitude.

The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences in all aspects of data management. To learn more about SIGMOD/PODS, please visit https://2021.sigmod.org/.

A preprint version of the paper ist available for download.