作者: Yanpei Chen , Sara Alspaugh , Randy Katz
关键词:
摘要: Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important workloads emerged which feature many small, short, and increasingly interactive jobs addition to large, long-running batch MapReduce was originally designed. As interactive, query processing is a strength of RDBMS community, it that lessons from field be carried over applied where possible this domain. However, not yet been described literature. We fill gap an empirical analysis traces six separate business-critical deployments inside Facebook at Cloudera customers e-commerce, telecommunications, media, retail. Our key contribution characterization are driven part by analysis, make heavy use query-like programming frameworks on top MapReduce. These display behaviors invalidate prior assumptions about such as uniform access, regular diurnal patterns, prevalence large jobs. A secondary first step towards creating TPC-like benchmark