作者: Magdalena Balazinska , Mehul A. Shah , Jeong-Hyon Hwang
DOI:
关键词:
摘要: DEFINITION Just like any other software system, a data stream management system (DSMS) can experience failures of its different components. Failures are especially common in distributed DSMSs, where query operators spread across multiple processing nodes, i.e., independent processes typically running on physical machines local-area network (LAN) or wide-area (WAN). nodes the underlying communication cause continuous queries (CQ) DSMS to stall produce erroneous results. These adversely affect critical client applications relying these queries. Traditionally, availability has been defined as fraction time that remains operational and properly servicing requests. In however, often also incorporates end-to-end latencies need quickly react real-time events thus tolerate only small delays. A handle using variety techniques offer levels depending application needs. All fault-tolerance methods rely some form replication, volatile state is stored multiple, locations protect against failures. This article describes several such trade-offs between runtime overhead while maintaining consistency. For cases partitions, it outlines avoid stalling at cost temporary inconsistency, thereby providing highest availability. focuses within does not discuss sources applications.