作者: Shicong Meng , Arun K. Iyengar , Isabelle M. Rouvellou , Ling Liu , Kisung Lee
关键词:
摘要: State monitoring is widely used for detecting critical events and abnormalities of distributed systems. As the scale such systems grows degree workload consolidation increases in Cloud data centers, node failures performance interferences, especially transient ones, become norm rather than exception. Hence, state tasks are often exposed to impaired communication caused by dynamics on different nodes. Unfortunately, existing approaches designed under assumption always-online nodes reliable inter-node communication. a result, these produce misleading results which turn introduce various problems users who rely perform automatic management as auto-scaling. This paper introduces new approach that tackles this challenge exposing handling message delay loss environments. Our delivers two distinct features. First, it quantitatively estimates accuracy capture uncertainties introduced messaging dynamics. feature helps distinguish trustworthy from ones heavily deviated truth, yet significantly improves utility compared with simple techniques invalidate all generated presence Second, our also adapts non-transient issues reconfiguring algorithms minimize errors. experimental show that, even severe delay, consistently accuracy, when applied application auto-scaling, outperforms terms ability correctly trigger dynamic provisioning.