Datenschutzerklärung|Data Privacy

A. Borusan

14.05.2012, 16 Uhr c.t. TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "If Pigs Could Fly: Integrating Apache Pig and Stratosphere" (Vasiliki Kalavri, KTH, Sweden)

Writing efficient applications in MapReduce or PACT requires strong programming skills and in-depth understanding of the systems’ architectures. In order to make the power of these systems accessible to non-experts, save development time and make application code easier to understand and maintain, several high-level languages have been developed.
One of the most popular high-level dataflow systems is Apache Pig. Pig overcomes Hadoop’s one-input and two-stage dataflow limitations, allowing the developer to write SQL-like scripts. However, Hadoop's limitations are still present in the backend system and add a notable overhead to the execution time. Pig is currently implemented on top of Hadoop, however it has been designed to be modular and independent of the execution engine.
For my thesis project, I am currently working on integrating Pig and Stratosphere. I believe that Stratosphere has desirable properties that will significantly improve Pig's performance. In this talk, I will present the goal, motivation and expectations of my project. I will give an introduction to the Pig system internals, i.e. the data model, the compilers, and the optimizers. I will also focus on the integration methodology, integration alternatives, challenges faced and design decisions. Finally, I will briefly present the evaluation strategy planned.