作者: Huan Chen , Yijie Wang , Yuan Wang , Xingkong Ma
关键词:
摘要: The big data era is characterized by the emergence of live with high volume and fast arrival rate, it poses a new challenge to stream processing applications: how process unbounded in real time throughput. sliding window technique widely used handle storing most recent history streams. However, existing centralized solutions cannot satisfy requirements for capacity low latency due single-node bottleneck. Moreover, studies on distributed windows primarily focus specific operators, while general framework various window-based operators wanted. In this paper, we firstly classify two categories: data-independent data-dependent operators. Then, propose GDSW, count-based window, which can both Besides, order balance system load, further dynamic load algorithm called DAD based buffer usage. Our implemented Apache Storm 0.10.0. Extensive evaluation shows that GDSW achieve sub-second latency, 10X improvement throughput compared processing, when rapid rate or size window.