24.11.2017
A. Borusan

04.12.2017, 16 Uhr c.t. TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "1. Database Design for NoSQL Systems (long research pres), 2.Modeling Strategies for Storing Data in Distributed Heterogeneous NoSQL databases

1. Abstract:
The heterogeneity of NoSQL data models led to a little use of traditional modeling techniques, as opposed to what has happened with databases for decades. Although NoSQL databases are claimed to be flexible and without a static schema the design of data organization requires important decisions, to map data to the modeling elements (collections, documents, tables, columns, keys, key-value pairs) available in the target datastore. These decisions are significant, because of their impact on the above major quality requirements.
An effective design methodology for NoSQL systems supporting those quality requirements criticall for next-generation Web applications can be indeed devised. The presented approach is based on NoAM (NoSQL Abstract Model), a novel abstract data model for NoSQL databases, which is used to specify a system-independent representation of the application data and which exploits the commonalities of the various NoSQL datastores.

2. Abstract:
Data management has become an essential functionality of modern information systems.
With the birth of the digital environments, the volume of data generated and available has grown up giving start to the Big Data era. NoSQL systems has been introduced to handle this large volume of data with providing availability, scalability, and efficiency. There is a considerable heterogeneity among the various NoSQL systems: different data models, different APIs, different implementations. Moreover, data modeling for NoSQL systems is not formalized mainly due to the flexible semi structured nature of their models. Recent research results have shown how modeling decisions impact the quality requirements such as scalability and performance.
In this work we propose HerM (Heterogeneous Distributed Model), a NoSQL data modeling approach which supports the usage of multiple heterogeneous NoSQL systems in a distributed environment. We define the conceptual elements necessary for data modeling and we identify optimized data distribution patterns. We also map HerM into a physical model that increases performances for distributed Joins.
We implemented a flexible framework, where we deployed our proposed modeling strategies. The framework provides a transparent interface to access the underlying heterogeneous systems in an efficient manner and provides the ability to easily configure different use cases. We provide a detailed evaluation of our framework comparing native MongoDB implementation on different scenarios for a large dataset considering performance and stability.