Datenschutzerklärung|Data Privacy

A. Borusan

14.12.2011, 11 Uhr c.t Raum: EN 719, TU Berlin, Einsteinufer 17, 10587 Berlin: "A Hybrid Approach to Physical Data Placement in Relational Database Systems" (Daniel Bößwetter (FU Berlin))

In recent years, relational database technology has undergone a
diversification process: while major database vendors have refined a
single architecture for all data processing purposes over four decades, it
has now become evident in research and practice, that this architecture no
longer fits current hardware or nowadays' requirements. Column-oriented
physical data models and execution strategies have gained enormous
interest in research and industry. Column-orientation is known to
increase the execution speed of analytical relational queries (OLAP) that
require few attributes from many tuples instead of all attributes of few
tuples, as it is typical for transactional workloads (OLTP). Moreover,
it supports the compression of data which leads to higher throughput
for column scans which are common in data warehouse execution plans. On
the other hand, column-orientation and compression are counterproductive
for transactional processing, because each update potentially leads to
many writes or even worse, to an expensive reorganization of compressed
data. This results in the typical two-tier approach with one (or more)
row-oriented, possibly main-memory based system being responsible
for transaction processing and a separate column-oriented system for
the analytics. Data is transferred from the former to the latter in
regular intervals by an extract-transform-load (ETL) process. While
having two independent systems for OLAP and OLTP may have advantages,
it is not always desirable. Real-time businesses demand analytical
queries on up-to-date data so that a nightly ETL-process might be
insufficient. High-volume updates as found in telecommunication systems
might even be too large to be imported into the data warehouse in time. It
is thus an open research question whether a reunification of relational
systems into a single architecture for both requirements is possible,
such that data can be analyzed directly at it source. This talk deals
with hybrid data placement strategies which allow both workloads under
trade-offs to each other.