Improving Performance and Fairness for Big Data Job Schedulers in Large-Scale Datacenters

Zhou, Wei, Systems Engineering - School of Engineering and Applied Science, University of Virginia
White, K. Preston, Department of Systems Engineering, University of Virginia


It is a critical challenge to design a highly-efficient, high-performance, and fair big data job scheduler, especially in large-scale datacenters consisting of heterogeneous servers under intensive, complex, and diverse workloads. Hybrid job schedulers, which combine a centralized job scheduler and multiple distributed job schedulers together, have been considered as a promising alternative to conventional centralized job schedulers deployed in enterprise datacenters. However, our literature survey and experimental study show that, (1) the state-of-the-art hybrid job schedulers fail to ensure low latency for latency-sensitive short jobs; and (2) the state-of-the-art fair job schedulers for constrained jobs fail to ensure fair sharing in heterogenous-server environments.

To this end, we first address the high-latency performance issue of short jobs due to the head-of-line blocking and straggler tasks for hybrid job schedulers. We propose Dice, a new general performance optimization framework for hybrid job schedulers to alleviate the high job latency problem of short jobs. Dice is composed of two simple yet effective techniques: Elastic Sizing and Opportunistic Preemption. Both Elastic Sizing and Opportunistic Preemption keep track of the task waiting times of short jobs. When the mean task waiting time of short jobs is high, Elastic Sizing dynamically and adaptively increases the short partition size to prioritize short jobs over long jobs. On the other hand, Opportunistic Preemption preempts resources from long tasks running in the general partition on demand, so as to mitigate the head-of-line blocking problem of short jobs. We then propose Eirene, another new general performance optimization framework for hybrid job schedulers to improve job latency performance of short jobs via two schemes tightly coupled with the general architecture of hybrid job schedulers. Eirene consists of two schemes. Coordinated Cold Data Migration leverages high task waiting time of short jobs under heavily-loaded periods and migrates cold data from disks to local memory for the initial phase of reading input so as to shorten task runtime and queueing time. On the other hand, Scheduler-Aware Task Cloning exploits spare computing resources under lightly-loaded periods and performs proactive task cloning for short jobs to mitigate the straggler problem.

We then address the unfair scheduling of jobs with placement constraints in heterogeneous environments. We propose Eunomia, a performance-variation-aware fair job scheduler with placement constraints for heterogeneous datacenters. Eunomia introduces progress share fairness, which is meant to equalize the progress share of jobs as much as possible. Progress share of a job is defined as the ratio between the accumulated progress of scheduled tasks of a job, and the maximum accumulated progress of tasks that can run in the cluster if placement constraints are removed.

PHD (Doctor of Philosophy)
big data analytics, resource management, hybrid job scheduler, fair job scheduler
All rights reserved (no additional license for public reuse)
Issued Date: