作者: Robert Grandl , Arjun Singhvi , Raajay Viswanathan , Aditya Akella
DOI:
关键词:
摘要: Today’s data analytics frameworks are intrinsically compute-centric. Key details of analytics execution–work allocation to distributed compute tasks, intermediate data storage, task scheduling, etc.–depend on the pre-determined physical structure of the high-level computation. Unfortunately, this hurts flexibility, performance, and efficiency. We present F2, a new analytics framework that cleanly separates computation from intermediate data. It enables runtime visibility into data via programmable monitoring, and data-driven computation (where intermediate data values drive when and what computation runs) via an event abstraction. Experiments with an F2 prototype on a large cluster using batch, streaming, and graph analytics workloads show that it significantly outperforms state-of-the-art compute-centric engines.