A. Borusan

01.11.2011, 11 Uhr s.t. EN-719: "Machine learning in ScalOps, a higher order cloud computing language" (Markus Weimer, Yahoo)

In this talk, I will introduce ScalOps. ScalOps is a new internal domain-specific language (DSL) for Big Data analytics that targets machine learning and graph-based algorithms. It unifies the so-far distinct DAG processing as found in e.g. PIG and the iterative computation needs of machine learning in a single language and runtime. It exposes a declarative language that is reminiscent to Pig with iterative extensions. The scaloop block captures iteration and packages it in the execution plan so that it can be optimized for caching opportunities and handed off to the runtime. The Hyracks runtime directly supports these iterations as recursive queries, thereby avoiding the pitfalls of an outer driver loop. I will highlight the expressiveness of ScalOps and its amenability to optimizations using a real world, large scale machine learning example drawn from Yahoo! Mail, one of the biggest email providers in the world.