02.11.2015

A. Borusan

## 19.11.2015, 10 Uhr c.t. TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "Large-Scale Machine Learning With The SimSQL System" (Prof. Chris Jermaine, Rice University.)

n this talk, I’ll describe the SimSQL system, which is a platform for writing and executing statistical codes over large data sets, particularly for machine learning applications. Codes that run on SimSQL can be written in a very high-level, declarative language called Buds. A Buds program looks a lot like a mathematical specification of an algorithm, and statistical codes written in Buds are often just a few lines long.

At its heart, SimSQL is really a relational database system, and like other relational systems, SimSQL is designed to support data independence. That is, a single declarative code for a particular statistical inference problem can be used regardless of data set size, compute hardware, and physical data storage and distribution across machines. One concern is that a platform supporting data independence will not perform well. But we’ve done extensive experimentation, and have found that SimSQL performs as well as other competitive platforms that support writing and executing machine learning codes for large data sets.