A Data and Contention Aware Approach to Dynamic Scheduling for Heterogeneous Processors

Gregg, Christopher, Computer Engineering - School of Engineering and Applied Science, University of Virginia
Hazelwood, Kim, Department of Computer Science, University of Virginia

Heterogeneous computers, with a multi-core central processing unit (CPU) and one or more many-core graphical processing units (GPUs) capable of running general purpose applications, are becoming standard on the desktop and in cluster and supercomputing platforms. Using programming languages and extensions such as CUDA or OpenCL, application developers can write code that can run on any of the heterogeneous devices available on the system. As these tools mature, contention for device resources will become more prevalent, especially in systems that utilize an application queue. Current application scheduling methods only consider single applications and preferentially schedule each application for a device regardless of other applications in the queue. This degrades overall computational throughput, and leads to device underutilization.

Scheduling application kernels across all heterogeneous components to maximize application throughput is nontrivial and requires the scheduler to have knowledge of the location of data that will be needed for each kernel, the state of the system when each kernel will be launched, and the scheduler be able to predict runtimes for individual applications. Furthermore, a scheduler can utilize historical information about prior kernel run times to further influence its decision.

This dissertation investigates these elements of heterogeneous scheduling and develops a novel methodology for efficiently making dynamic scheduling decisions that maximize computational throughput when multiple applications are run on a heterogeneous computer. This dissertation describes an innovative taxonomy for describing data transfer requirements for a heterogeneous application, and it also shows that dynamic scheduling for heterogeneous computers benefits from knowledge about system state information, including data locality and contention among running processes. Finally, this dissertation investigates a novel technique to increase GPU throughput by running concurrent kernels on a device, based upon their orthogonal use of device resources.

PHD (Doctor of Philosophy)
Heterogeneous Scheduling, GPGPU, Computer Architecture, OpenCL, CUDA, Dynamic Scheduling, Parallel Computing
All rights reserved (no additional license for public reuse)
Issued Date: