Interactive analytical processing in big data systems

作者： Yanpei Chen , Sara Alspaugh , Randy Katz

关键词:

摘要: Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important workloads emerged which feature many small, short, and increasingly interactive jobs addition to large, long-running batch MapReduce was originally designed. As interactive, query processing is a strength of RDBMS community, it that lessons from field be carried over applied where possible this domain. However, not yet been described literature. We fill gap an empirical analysis traces six separate business-critical deployments inside Facebook at Cloudera customers e-commerce, telecommunications, media, retail. Our key contribution characterization are driven part by analysis, make heavy use query-like programming frameworks on top MapReduce. These display behaviors invalidate prior assumptions about such as uniform access, regular diurnal patterns, prevalence large jobs. A secondary first step towards creating TPC-like benchmark

berkeley.edu PDF 下载加速

cmu.edu PDF 下载加速

acm.org LINK 下载加速

dtic.mil PDF 下载加速

berkeley.edu LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(44)

Stefan Krompass, Umeshwar Dayal, Harumi Kuno, Alfons Kemper, Dynamic workload management for very large data warehouses: juggling feathers and bowling balls very large data bases. pp. 1105- 1115 ,(2007)

Scott Shenker, Ali Ghodsi, Dhruba Borthakur, Srikanth Kandula, Ganesh Ananthanarayanan, Ion Stoica, Andrew Wang, PACMan: coordinated memory caching for parallel jobs networked systems design and implementation. pp. 20- 20 ,(2012)

Nick Feamster, Hari Balakrishnan, Detecting BGP configuration faults with static analysis networked systems design and implementation. pp. 43- 56 ,(2005) , 10.5555/1251203.1251207

John K. Ousterhout, Hervé Da Costa, David Harrison, John A. Kunze, Mike Kupfer, James G. Thompson, A trace-driven analysis of the UNIX 4.2 BSD file system symposium on operating systems principles. ,vol. 19, pp. 15- 24 ,(1985) , 10.1145/323627.323631

Dhruba Borthakur, Samuel Rash, Rodrigo Schmidt, Amitanand Aiyer, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Apache hadoop goes realtime at Facebook international conference on management of data. pp. 1071- 1080 ,(2011) , 10.1145/1989323.1989438

Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, Peter J. Weinberger, Quickly generating billion-record synthetic databases international conference on management of data. ,vol. 23, pp. 243- 252 ,(1994) , 10.1145/191839.191886

Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)

Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, HaLoop Proceedings of the VLDB Endowment. ,vol. 3, pp. 285- 296 ,(2010) , 10.14778/1920841.1920881

Willis Lang, Jignesh M. Patel, Energy management for MapReduce clusters Proceedings of the VLDB Endowment. ,vol. 3, pp. 129- 139 ,(2010) , 10.14778/1920841.1920862

10.

Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Andrew Goldberg, Quincy: fair scheduling for distributed computing clusters symposium on operating systems principles. pp. 261- 276 ,(2009) , 10.1145/1629575.1629601

Interactive analytical processing in big data systems

来源期刊

我的账户

Interactive analytical processing in big data systems

来源期刊

相似文章 10

我的账户