Design and implementation of an efficient data stream processing system

作者: Ali Salehi

DOI: 10.5075/EPFL-THESIS-4611

关键词:

摘要: In standard database scenarios, an end-user assumes that all data (e.g., sensor readings) is stored in a database. Therefore, one can simply submit any arbitrary complex processing the form of SQL queries or procedures to server. Data stream oriented applications are typically dealing with huge volumes data. Storing and performing off-line on this dataset be costly, time consuming impractical. This work describes our research results while designing implementing efficient management system for online streams field environmental monitoring. Our target sources wireless networks. Although focus specific application domain, thesis designed generic way, so they applied wide variety applications. starts by first presenting state-of-the-art specifically window concepts, continuous queries, filtering query languages in-network (particular TinyOS-based approaches). We present key existing engines, their internal architecture how compared platform, namely Global Sensor Network (GSN) middleware. GSN middleware enables fast flexible deployment interconnection It provides simple uniform access comprehensive set heterogeneous technologies. Additionally, offers zero-programming data-oriented integration networks supports dynamic re-configuration adaptation at runtime. virtual concept, which high-level view sources, its powerful declarative specification tools. Furthermore, we describe design, conceptual, architectural optimization decisions platform detail. order achieve high efficiency large streaming using window-based algorithms techniques intelligently group process different types queries. While adapting scale network deployments, have encountered several performance bottlenecks. One challenges faced was related scalable delivery rate streams. found out could dramatically improve processor grouping user hence sharing both memory costs among similar Moreover, issue scheduling Problem efficiently execution sliding parameters not addressed depth literature. problem becomes severe when considers these cases, scheduler only increases least magnitude but also, decreases response requirements. Finally, get integrated external visualization framework Microsoft's SenseWeb platform. gathering infrastructure globally accessible end users. (which initiated Swiss Experiment project demanded users) shows scalability combined optimized algorithms, also demonstrates flexibility.

参考文章(69)
Peter Pietzuch, Matt Welsh, Mema Roussopoulos, Jonathan Ledlie, Margo Seltzer, Jeff Shneidman, Hourglass: An Infrastructure for Connecting Sensor Networks and Applications ,(2004)
Ugur Çetintemel, Stanley B. Zdonik, Hari Balakrishnan, Michael Stonebraker, Mitch Cherniack, Magdalena Balazinska, The Aurora and Medusa Projects. IEEE Data(base) Engineering Bulletin. ,vol. 26, pp. 3- 10 ,(2003)
Graham Cormode, S. Muthukrishnan, Summarizing and mining Skewed data streams siam international conference on data mining. pp. 44- 55 ,(2005)
Martin Ester, Aoying Zhou, Weining Qian, Feng Cao, Density-Based Clustering over an Evolving Data Stream with Noise. siam international conference on data mining. pp. 328- 339 ,(2006)
M. Sgroi, A. Wolisz, A. Sangiovanni-Vincentelli, J.M. Rabaey, A Service-Based Universal Application Interface for Ad Hoc Wireless Sensor and Actuator Networks ambient intelligence. pp. 149- 172 ,(2005) , 10.1007/3-540-27139-2_8
Nesime Tatbul, Uğur Çetintemel, Stan Zdonik, Mitch Cherniack, Michael Stonebraker, Load shedding in a data stream manager very large data bases. pp. 309- 320 ,(2003) , 10.1016/B978-012722442-8/50035-5
Utkarsh Srivastava, Jennifer Widom, Memory-limited execution of windowed stream joins very large data bases. pp. 324- 335 ,(2004) , 10.1016/B978-012088469-8.50031-0
Alberto Lerner, Dennis Shasha, AQuery: query language for ordered data, optimization techniques, and experiments very large data bases. pp. 345- 356 ,(2003) , 10.1016/B978-012722442-8/50038-0
Sara Cohen, Werner Nutt, Yehoshua Sagiv, Containment of Aggregate Queries international conference on database theory. pp. 111- 125 ,(2003) , 10.1007/3-540-36285-1_8
Johannes Gehrke, Yong Yao, Query Processing in Sensor Networks. conference on innovative data systems research. ,(2003)