作者: Zeinab Shmeis , Mohamad Jaber , None
DOI: 10.1016/J.FUTURE.2019.03.044
关键词:
摘要: Abstract Spark is the leading platform for distributed large-scale data processing. Spark’s Application Programming Interface (API) has a powerful easy-to-use abstractions similarly related to functional programming (e.g., map , filter reduce ) in several different languages. However, writing an efficient applications still error-prone, time-consuming, and requires clear deep understanding of inner-workings Spark. For instance, same task can be implemented ways, yet execution time vary drastically between them. this, we introduce TaBOS, rewrite-based optimizer programs. TaBOS takes job automatically generates state-space equivalent optimized jobs using set semantics-preserving rewrite rules. Then, from generated state-space, it selects one optimal program based on predefined strategy. We selection strategies with maximum number applied rules, minimum heavy operations) identifying state-space. evaluate effectiveness, robustness speedup gain our solutions case studies.