Datenbestand vom 10. Dezember 2024
Verlag Dr. Hut GmbH Sternstr. 18 80538 München Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
aktualisiert am 10. Dezember 2024
978-3-8439-0927-3, Reihe Informatik
Sebastian Bächle Separating Key Concerns in Query Processing – Set orientation, Physical Data independence, and Parallelism
257 Seiten, Dissertation Technische Universität Kaiserslautern (2012), Softcover, A5
Declarative query languages are the most convenient and most productive abstraction for interacting with complex data management systems. While the developer can focus on the application logic, the compiler takes care of translating and optimizing a query for efficient execution. Today, applications increasingly call for declarative data management for many novel storage designs and system architectures.
The realization of a query processing system for every new kind of storage, language, or data model is a complex and time-consuming task. It requires considerable effort to design, implement, test, and optimize a compiler, which utilizes the system optimally. Thereby, a large part of the work is devoted to porting and adapting proven algorithms and optimizations from existing solutions.
This thesis studies the design of a compiler and runtime infrastructure for consolidating these development efforts. It aims at a decoupled organization of the main concerns of every query processing system.
Set orientation is a key concept for efficiently processing large amounts of data. We develop an intermediate representation for compiling queries and scripts with arbitrary nestings. It bases on the idea of composing higher-order functions to relational-style processing pipelines, which allow us to apply common set-oriented optimizations independently of the concrete data model used.
Physical data independence is mandatory for building a portable compiler and runtime. Our approach generally abstracts from physical aspects to cover a wide range structured and semi-structured data models. For efficiency, we present compilation techniques for tailoring and optimizing a query for a concrete platform.
Parallelism is crucial for exploiting modern hardware architectures. We present a novel push-based operator model, which uses divide-and-conquer and self-scheduling techniques for creating and controlling parallelism dynamically at runtime.