Scheduling Parallel Computations in a Heterogeneous Environment

Weissman, Jon B., Department of Computer Science, University of Virginia
Grimshaw, Andrew, Department of Computer Science, University of Virginia

A metasystem is a shared ensemble of workstations, vector, and parallel machines connected by local- and wide-area networks. The large array of heterogeneous resources in the metasystem offers an opportunity for delivering high performance on a range of applications. Achieving high performance requires effective scheduling of system resources.
This dissertation explores one dimension of the scheduling problem — automatic scheduling of data parallel computations in local-area metasystems containing workstations and multicomputers. Scheduling requires that the problem be decomposed into a set of tasks and data and assigned to processors in a manner that reduces completion time. Problem decomposition is known as partitioning and task assignment is known as placement. Scheduling also requires that the best subset of available processors be selected. No existing system solves all of these problems. We show that scheduling can be performed automatically, efficiently, and profitably for a range of parallel computations in this environment. A framework has been developed to study the scheduling problem. The framework implements several scheduling heuristics that automate processor selection, partitioning, and placement. At the heart of the framework is a model for representing program and system resource information. From this information, a set of cost functions are constructed to predict computation and communication costs that guide the scheduling process. Scheduling results in a load balanced decomposition
of the problem at an appropriate computation granularity.
A framework simulator called Prophesy and a framework implementation in the Legion parallel processing system called Prophet have been completed. The Legion implementation has been applied to a number of real data parallel applications. The results indicate that excellent performance is obtained, scheduling overhead is small, and the costs of heterogeneous parallel processing, format conversion and routing, can be tolerated. A simulation study confirms the performance results and is validated by the experimental results.

PHD (Doctor of Philosophy)
Scheduling, data, computers
All rights reserved (no additional license for public reuse)
Issued Date: