Interactive analytical processing in big data systems

作者: Yanpei Chen , Sara Alspaugh , Randy Katz

DOI: 10.14778/2367502.2367519

关键词:

摘要: Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important workloads emerged which feature many small, short, and increasingly interactive jobs addition to large, long-running batch MapReduce was originally designed. As interactive, query processing is a strength of RDBMS community, it that lessons from field be carried over applied where possible this domain. However, not yet been described literature. We fill gap an empirical analysis traces six separate business-critical deployments inside Facebook at Cloudera customers e-commerce, telecommunications, media, retail. Our key contribution characterization are driven part by analysis, make heavy use query-like programming frameworks on top MapReduce. These display behaviors invalidate prior assumptions about such as uniform access, regular diurnal patterns, prevalence large jobs. A secondary first step towards creating TPC-like benchmark

参考文章(44)
Stefan Krompass, Umeshwar Dayal, Harumi Kuno, Alfons Kemper, Dynamic workload management for very large data warehouses: juggling feathers and bowling balls very large data bases. pp. 1105- 1115 ,(2007)
Scott Shenker, Ali Ghodsi, Dhruba Borthakur, Srikanth Kandula, Ganesh Ananthanarayanan, Ion Stoica, Andrew Wang, PACMan: coordinated memory caching for parallel jobs networked systems design and implementation. pp. 20- 20 ,(2012)
Nick Feamster, Hari Balakrishnan, Detecting BGP configuration faults with static analysis networked systems design and implementation. pp. 43- 56 ,(2005) , 10.5555/1251203.1251207
John K. Ousterhout, Hervé Da Costa, David Harrison, John A. Kunze, Mike Kupfer, James G. Thompson, A trace-driven analysis of the UNIX 4.2 BSD file system symposium on operating systems principles. ,vol. 19, pp. 15- 24 ,(1985) , 10.1145/323627.323631
Dhruba Borthakur, Samuel Rash, Rodrigo Schmidt, Amitanand Aiyer, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Apache hadoop goes realtime at Facebook international conference on management of data. pp. 1071- 1080 ,(2011) , 10.1145/1989323.1989438
Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, Peter J. Weinberger, Quickly generating billion-record synthetic databases international conference on management of data. ,vol. 23, pp. 243- 252 ,(1994) , 10.1145/191839.191886
Robert J Tibshirani, Bradley Efron, An introduction to the bootstrap ,(1993)
Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, HaLoop Proceedings of the VLDB Endowment. ,vol. 3, pp. 285- 296 ,(2010) , 10.14778/1920841.1920881
Willis Lang, Jignesh M. Patel, Energy management for MapReduce clusters Proceedings of the VLDB Endowment. ,vol. 3, pp. 129- 139 ,(2010) , 10.14778/1920841.1920862
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Andrew Goldberg, Quincy: fair scheduling for distributed computing clusters symposium on operating systems principles. pp. 261- 276 ,(2009) , 10.1145/1629575.1629601