作者: Yanpei Chen , Randy H. Katz
DOI:
关键词:
摘要: Large-scale data-centric systems help organizations store, manipulate, and derive value from large volumes of data. They consist distributed components spread across a scalable number connected machines involve complex software/hardware stacks with multiple semantic layers. These solve established problems involving amounts data, while catalyzing new, data-driven businesses such as search engines, social networks, cloud computing data storage service providers. The complexity, diversity, scale, rapid evolution large-scale make it challenging to develop intuition about these systems, gain operational experience, improve performance. It is an important research problem method design evaluate based on the empirical behavior targeted workloads. Using unprecedented collection nine industrial workload traces business-critical we workload-driven evaluation for apply address previously unsolved problems. Specifically, dissertation contributes following: 1. A conceptual framework breaking down workloads into access patterns, computation load arrival patterns. 2. analysis synthesis that uses multi-dimensional, non-parametric statistics extract insights produce representative behavior. 3. Case studies deployments MapReduce enterprise network two examples systems. 4. energy-efficient system Internet datacenter transport protocol pathologies, topics require workload-specific address. Overall, develops more objective systematic understanding emerging class computer work in this helps further accelerate adoption real life relevant business, science, day-to-day consumers.