GDSW: A General Framework for Distributed Sliding Window over Data Streams

作者: Huan Chen , Yijie Wang , Yuan Wang , Xingkong Ma

DOI: 10.1109/ICPADS.2016.0100

关键词:

摘要: The big data era is characterized by the emergence of live with high volume and fast arrival rate, it poses a new challenge to stream processing applications: how process unbounded in real time throughput. sliding window technique widely used handle storing most recent history streams. However, existing centralized solutions cannot satisfy requirements for capacity low latency due single-node bottleneck. Moreover, studies on distributed windows primarily focus specific operators, while general framework various window-based operators wanted. In this paper, we firstly classify two categories: data-independent data-dependent operators. Then, propose GDSW, count-based window, which can both Besides, order balance system load, further dynamic load algorithm called DAD based buffer usage. Our implemented Apache Storm 0.10.0. Extensive evaluation shows that GDSW achieve sub-second latency, 10X improvement throughput compared processing, when rapid rate or size window.

参考文章(22)
Kostas Patroumpas, Timos Sellis, Window specification over data streams extending database technology. pp. 445- 464 ,(2006) , 10.1007/11896548_35
Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, Jennifer Widom, STREAM: The Stanford Data Stream Management System Data-Centric Systems and Applications. pp. 317- 336 ,(2016) , 10.1007/978-3-540-28608-0_16
Buğra Gedik, Generic windowing support for extensible stream processing systems Software - Practice and Experience. ,vol. 44, pp. 1105- 1128 ,(2014) , 10.1002/SPE.2194
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica, Discretized streams: fault-tolerant streaming computation at scale symposium on operating systems principles. pp. 423- 438 ,(2013) , 10.1145/2517349.2522737
Xiaoyong Li, Yijie Wang, Xiaoling Li, Yuan Wang, Parallel skyline queries over uncertain data streams in cloud computing environments International Journal of Web and Grid Services. ,vol. 10, pp. 24- 53 ,(2014) , 10.1504/IJWGS.2014.058759
Cagri Balkesen, Nesime Tatbul, M. Tamer Özsu, Adaptive input admission and management for parallel stream processing Proceedings of the 7th ACM international conference on Distributed event-based systems - DEBS '13. pp. 15- 26 ,(2013) , 10.1145/2488222.2488258
Sachini Jayasekara, Sameera Kannangara, Tishan Dahanayakage, Isuru Ranawaka, Srinath Perera, Vishaka Nanayakkara, Wihidum: Distributed complex event processing Journal of Parallel and Distributed Computing. ,vol. 79, pp. 42- 51 ,(2015) , 10.1016/J.JPDC.2015.03.002
Yuan Wang, Yijie Wang, Xiaoyong Li, Xiaoling Li, A survey of queries over uncertain data Knowledge and Information Systems. ,vol. 37, pp. 485- 530 ,(2013) , 10.1007/S10115-013-0638-6
Nicoló Rivetti, Leonardo Querzoni, Emmanuelle Anceaume, Yann Busnel, Bruno Sericola, Efficient key grouping for near-optimal load balancing in stream processing systems distributed event-based systems. pp. 80- 91 ,(2015) , 10.1145/2675743.2771827