Rethinking the Architecture of Warehouse-Scale Computers

Mars, Jason, Computer Science - School of Engineering and Applied Science, University of Virginia
Soffa, Mary, Department of Computer Science, University of Virginia

As the world’s computation continues to move into the massive datacenter infrastructures recently coined as “warehouse-scale computers” (WSCs), developing highly efficient systems for these computing platforms has become increasingly critical.

The architecture of modern WSCs remains in its relative infancy. In designing modern WSCs, architects start with commodity off-the-shelf components including commodity pro- cessors and open source system software components. These components are then stitched together to design a simple and cost effective WSC. While this approach has been effective for producing systems that are functional and can scale the delivery of web-services as de- mand increases, efficiency has suffered. The commodity components and system software used have not been designed and refined with the unique characteristics of WSCs in mind, and these characteristics may be critical for a highly efficient WSC design. As such, we must rethink the architecture of modern WSCs.

This dissertation argues that one such characteristic has been overlooked: the diversity in execution environments in modern WSCs. We define a given task’s execution environ- ment as the coupling of the machine configuration and the co-running tasks simultaneously executing alongside the given task. At any given time in a WSC, we have a high degree of diversity across these execution environments. This dissertation argues that acknowledging, exploiting, and adapting to the diversity in execution environments are critical for the de- sign of a highly efficient WSC. When ignoring this diversity, three critical design problems arise, including 1) the homogeneous assumption, where all machines and cores in a WSC are assumed to be equal and managed accordingly, 2) the rigidness of applications, where application binaries can not adapt to changes across and within execution environments, and 3) the oblivion of interference, where interference between tasks within an execution environment can not be measured or managed.

This dissertation addresses each of these three design problems. First, we address the
homogeneous assumption at the cluster level by redesigning the task manager in the WSC to learn which execution environments tasks prefer and map them accordingly. Second, we address the rigidness of applications at the machine level by providing a mechanism to allow applications to adapt to their execution environment, then leverage this mechanism to solve pressing problems in WSCs. Lastly, we address the oblivion of interference at both the cluster and machine levels by providing novel metrics and techniques for measuring and managing interference to improve the utilization of WSCs.

By incorporating an awareness of the diversity in execution environments in these three key design areas, we produce a WSC design that is significantly more efficient in both the performance of the applications that live in this domain and the utilization of compute resources in the WSC. By improving efficiency for these two metrics, we effectively require a smaller WSC for some fixed workload, which has implications on reducing not only the cost of these systems, but also their environmental footprint.

PHD (Doctor of Philosophy)
Warehouse-Scale Computers, Datacenters, Multicore, Architecture, Compilers, Profiling Techniques, Bubble-Up, Utilization, Software Systems, Runtime Systems
All rights reserved (no additional license for public reuse)
Issued Date: