Workload-driven design and evaluation of large-scale data-centric systems

作者： Yanpei Chen , Randy H. Katz

DOI:

关键词:

摘要: Large-scale data-centric systems help organizations store, manipulate, and derive value from large volumes of data. They consist distributed components spread across a scalable number connected machines involve complex software/hardware stacks with multiple semantic layers. These solve established problems involving amounts data, while catalyzing new, data-driven businesses such as search engines, social networks, cloud computing data storage service providers. The complexity, diversity, scale, rapid evolution large-scale make it challenging to develop intuition about these systems, gain operational experience, improve performance. It is an important research problem method design evaluate based on the empirical behavior targeted workloads. Using unprecedented collection nine industrial workload traces business-critical we workload-driven evaluation for apply address previously unsolved problems. Specifically, dissertation contributes following: 1. A conceptual framework breaking down workloads into access patterns, computation load arrival patterns. 2. analysis synthesis that uses multi-dimensional, non-parametric statistics extract insights produce representative behavior. 3. Case studies deployments MapReduce enterprise network two examples systems. 4. energy-efficient system Internet datacenter transport protocol pathologies, topics require workload-specific address. Overall, develops more objective systematic understanding emerging class computer work in this helps further accelerate adoption real life relevant business, science, day-to-day consumers.

escholarship.org 本地加速

escholarship.org LINK 下载加速

参考文章(68)

Scott Shenker, Ali Ghodsi, Matei Zaharia, Andrew Konwinski, Anthony D. Joseph, Benjamin Hindman, Ion Stoica, Nexus: A Common Substrate for Cluster Computing ,(2009)

Scott Shenker, Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Ion Stoica, Job Scheduling for Multi-User MapReduce Clusters ,(2009)

Wei Xu, Ling Huang, Armando Fox, David A Patterson, Michael I Jordan, None, Mining console logs for large-scale system problem detection usenix workshop on tackling computer systems problems with machine learning techniques. pp. 4- 4 ,(2008)

Scott Shenker, Ali Ghodsi, Dhruba Borthakur, Srikanth Kandula, Ganesh Ananthanarayanan, Ion Stoica, Andrew Wang, PACMan: coordinated memory caching for parallel jobs networked systems design and implementation. pp. 20- 20 ,(2012)

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744

Dhruba Borthakur, Samuel Rash, Rodrigo Schmidt, Amitanand Aiyer, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Apache hadoop goes realtime at Facebook international conference on management of data. pp. 1071- 1080 ,(2011) , 10.1145/1989323.1989438

Jerome H. Saltzer, A simple linear model of demand paging performance Communications of the ACM. ,vol. 17, pp. 181- 186 ,(1974) , 10.1145/360924.360926

Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, Peter J. Weinberger, Quickly generating billion-record synthetic databases international conference on management of data. ,vol. 23, pp. 243- 252 ,(1994) , 10.1145/191839.191886

R. Bianchini, R. Rajamony, Power and energy management for server systems IEEE Computer. ,vol. 37, pp. 68- 74 ,(2004) , 10.1109/MC.2004.217

10.

Willis Lang, Jignesh M. Patel, Energy management for MapReduce clusters Proceedings of the VLDB Endowment. ,vol. 3, pp. 129- 139 ,(2010) , 10.14778/1920841.1920862

Workload-driven design and evaluation of large-scale data-centric systems

来源期刊

我的账户

Workload-driven design and evaluation of large-scale data-centric systems

来源期刊

相似文章 4

From TPC-C to Big Data Benchmarks: A Functional Workload Model

Using Parametric Models to Represent Private Cloud Workloads

Resource Scheduling in Data-Centric Systems

Enabling Strategies for Big Data Analytics in Hybrid Infrastructures

我的账户