Methods Toward Automatic Configuration of Computing Environments for Application Execution

Sarnowska-Upton, Karolina, Computer Science - School of Engineering and Applied Science, University of Virginia
Grimshaw, Andrew, Department of Computer Science, University of Virginia

Computing resources are now ubiquitous and computational research techniques permeate all disciplines. However, exploiting available resources can be a much more complicated proposition. There is no guarantee that one can simply use a compute resource with no more effort than copying binaries and data. As computing resources are usually heterogeneous in both hardware and software configurations, many requirements be matched to execute a computation on a new resource in a new environment. The difficulties increase when dealing with parallel computations, which add a layer of dependencies related to the Message Passing Interface (MPI) standard libraries.

Unfortunately, managing the migration process by using existing techniques is inadequate or requires a non-trivial amount of effort and experience. In particular, schedulers are not generally designed to capture a computation's software-related requirements and, thus, depend on users to configure such dependencies. Additionally, the set of possible sites where computations could be scheduled is limited to where the computations are known to be able to run -- a determination that in the current state of the art is performed manually by the user. This process, which requires enumerating dependencies, checking and making them available in new environments, and potentially recompiling the computation, can take many hours of labor. The difficulty is compounded by the fact that many researchers in disciplines that were previously not traditionally compute-heavy may not have experience with configuring a single environment, let alone with migrating a computation from one environment to another.

An ideal solution for providing deployment and, therefore, scheduling freedom would allow any computation to quickly and easily be run on computing resources with tuned performance. Before addressing the difficult but secondary issues of automatic recompilation and tuning, the first step on the path toward an ideal solution is to consider how additional scheduling freedom could be achieved with minimum interaction from the user but without modification of the computation code. Such a solution would automatically enable application binaries to quickly and easily be run on new resources. Our hypothesis is that methods for automatically gathering information about execution requirements and composing site-specific instructions that configure the requirements at target environments are more efficient than manual methods for the preparation of multiple shared computing environments for the execution of MPI binaries. Achieving this initial solution alone can dramatically improve the ability of researchers to take advantage of the variety of computing resources available to them and, as a result, carry out more and better research.

Due to the additional requirements that arise when using MPI, our research specifically focuses on enabling the deployment and, therefore, scheduling freedom of parallel computations encoded using the MPI standard on high performance computing clusters. Specifically, to determine if a binary will be able to run without modification, how to form predictions about execution readiness was modeled and what execution-blocking issues could be resolved without recompilation was assessed. The effectiveness of these methods was examined by testing their implementation. The assumed difficulty of the migration process was also investigated by measuring how long researchers take to get computations running at new computing sites. This baseline was used to quantify the cost savings of the presented solution in terms of time.

The work presented in this dissertation is a first step toward an ideal solution of automatically enabling the usage of various computing resources for computation. The evaluation demonstrated the validity of the solution by providing correct predictions of execution readiness more than 90% of the time and enabling 41% more successful executions via generation of site-specific configurations. The effort analysis in terms of time exerted to use the solution predicts that the solution is, in the best case, an order of magnitude more efficient over current manual methods and, in the worst case, no less efficient.

PHD (Doctor of Philosophy)
parallel computing, migration, automation
All rights reserved (no additional license for public reuse)
Issued Date: