Scheduling to Ensure Performance and Cost Effectiveness in Power-Modulated Datacenters

Author: ORCID icon
Venkataswamy, Vanamala, Computer Science - School of Engineering and Applied Science, University of Virginia
Grimshaw, Andrew, EN-Comp Science Dept, University of Virginia

Datacenters are the critical infrastructure in today's information age. The sustained demand for digital services has led to record datacenter build-outs and increased energy consumption. Modern datacenters heavily rely on brown energy. Two significant problems with using brown energy are 1) brown energy is expensive and 2) harmful to the environment since brown energy generation releases greenhouse gases. Renewables are becoming increasingly accessible energy sources to power the datacenters, leading to dramatically lower energy costs and significant climate impact reductions. Green datacenters, also referred to as power-modulated datacenters, can utilize multiple energy sources (wind and solar) by intelligently adapting computing to energy generation. The difficulty with renewables is that power generation is intermittent and subject to frequent fluctuations, making job scheduling in such datacenters interesting from a research perspective. Green datacenters need intelligent systems and system software that adapt to the intermittent power supply from renewables.

Traditional heuristics-based job schedulers use hand-crafted scheduling policies. Hand-engineering domain-specific heuristics-based schedulers to meet specific objective functions in highly dynamic green datacenters is time-consuming, error-prone, expensive, and requires domain expertise. Reinforcement Learning (RL) has solved sequential decision making tasks of impressive difficulty by maximizing reward functions through trial and error. The growing body of research has shown that Reinforcement Learning schedulers can learn effective job scheduling policies in traditional datacenter environments with a constant power supply. Although the results demonstrated in the existing work are convincing, they do not examine the complexities presented in the dynamic green datacenter environments.

This dissertation delivers four fundamental contributions. First, we developed a unified green datacenter simulator driven by heuristic and RL scheduling policies and synthetic or real workloads and integrated multiple renewable energy sources to power the datacenter. The simulator allows resource scaling (small to medium scale), allowing the practitioners to experiment with datacenters of different capacities. Second, we systematically explore RL scheduler design features demonstrating the performance implications when adequately designed. Third, while many existing RL schedulers optimize for single objective effectively, they do not address multi-criteria optimization. Moreover, one or more of these objectives may be in opposition, e.g., maximizing the total value (revenue) while minimizing the overall job delay. We demonstrate that constrained RL schedulers learn to accomplish such opposing goals and satisfy multi-criteria optimization. Finally, classic online RL job schedulers can learn efficient scheduling strategies but often takes hundreds of thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Offline reinforcement learning, also known as batch RL, presents the prospect of policy optimization from large pre-recorded datasets without online environment interaction. Additionally, we show that incorporating prior datasets to pre-train the RL scheduler agent can short-circuit the random exploration phase and continuously improve with online data collection.

To deliver these contributions, we employed Offline, Online, and Constrained-Controlled RL methods. We evaluated the efficacy of these methods with diverse power supply and load conditions using synthetic and real workloads. This study provides several insights to design future RL schedulers that ensure performance and cost-effectiveness in power-modulated datacenters.

PHD (Doctor of Philosophy)
Resource Management , Job Scheduling, Reinforcement Learning , Power-Modulated Green Datacenters
Issued Date: