Datenbestand vom 10. Dezember 2024
Verlag Dr. Hut GmbH Sternstr. 18 80538 München Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
aktualisiert am 10. Dezember 2024
978-3-8439-0387-5, Reihe Informatik
Michael Duller Management and Federation of Stream Processing Applications
158 Seiten, Dissertation Eidgenössische Technische Hochschule (ETH) Zürich (2011), Hardcover, A5
A decade ago, stream processing has enabled a new class of applications by employing a fundamentally different processing model than conventional data base management systems. These applications process large volumes of continuous streams of input data with high throughput. Data stream management systems have evolved into industrial strength solutions for this class of applications and use long-running queries to process high volumes of continuous input data with low latency. However, they still lack flexibility in terms of large-scale deployment, integration, extensibility, and interoperability.
In the last years, a substantial ecosystem of new applications has emerged that can potentially benefit from stream processing. They range from the federation of existing but heterogeneous streaming applications to automated deployments of streaming applications in large clusters or cloud environments to processing personal information like photos as data streams. These applications introduce different requirements on how stream processing solutions can be deployed, integrated, extended, and federated.
This thesis explores stream processing with the help of traditional stream processing applications as well as applications that process personal information as data streams and identifies the fundamental properties that are common to all stream processing systems. The result is a generic model for stream processing and an architecture for a dynamic platform that supports the model. The model separates processing (operators) and data management (buffers) into distinct entities.
These properties enable the automated deployment of applications, facilitate the federation of applications running on heterogeneous stream processing systems, and leverage stream processing in new application domains. This thesis validates the generality of the model, the feasibility in terms of overhead, and the claims made in terms of deployment and integration. Experiments on PlanetLab, on a cluster, and on individual nodes confirm that the model and platform proposed in the thesis enable the interoperability between heterogeneous stream processing engines, facilitate the distributed deployment, and add functionality to the engines (ability to replace operators at runtime or to run a distribution-agnostic engine in a distributed setup) with negligible overhead.