Proactive Resource Management to Ensure Predictable End-to-End Performance for Cloud Applications

Author: ORCID icon
Kim, In Kee, Computer Science - School of Engineering and Applied Science, University of Virginia
Humphrey, Marty, Department of Computer Science, University of Virginia

Public IaaS clouds have become an essential infrastructure for enterprises and research organizations to run applications/services because of attractive capabilities from the public clouds. i.e., scalability, elasticity, resource diversity, and cost efficiency. Predictive resource management systems are developed to fully leverage such cloud infrastructures with two interrelated goals: maximizing SLA (Service Level Agreement) satisfaction and minimizing execution cost. However, existing predictive approaches are not sufficient to meet these two goals due to two uncertainties in public IaaS clouds -- workload uncertainty and performance uncertainty --, and often show insufficient performance and adaptability in predicting future workloads and guaranteeing performance SLA. As a result, existing methods incur frequent SLA violations and require high execution cost.

This dissertation is to address such problems to achieve proactive resource management that assures end-to-end performance of cloud applications on public IaaS clouds. This research includes three important mechanisms for this direction.

The first mechanism is CloudInsight that addresses the workload uncertainty. CloudInsight is a novel prediction framework for forecasting real-world cloud workloads, leveraging the combined power of multiple workload predictors. More specifically, CloudInsight predicts future workload changes by creating an ensemble model with multiple workload predictors. The weights of the predictors are determined at runtime based on their accuracy for current workload using multi-class regression. We evaluated CloudInsight with various workload traces from real-world cloud applications. The results show that CloudInsight has 13% -- 27% better accuracy than state-of-the-art predictors for all traces. Moreover, the results from a trace-based simulation with a representative resource management module show that CloudInsight has 15% -- 20% less under-/over-provisioning periods, resulting in better cost efficiency and lower SLA violations than existing predictors.

The second mechanism is Orchestra that handles the performance uncertainty. Orchestra is a cloud-specific framework for controlling multiple cloud applications in the user space, aiming at meeting corresponding SLAs. Orchestra takes an online approach with lightweight monitoring and creates performance models for multiple cloud applications on the fly. It then optimizes the allocations of shared resources (e.g., CPU, memory, IO, network) and controls the resources to satisfy SLAs. We evaluated the performance of Orchestra on a production cloud (Amazon EC2) with a diverse range of SLA requirements. The results show that Orchestra guarantees the performance of latency-sensitive/user-facing cloud applications (e.g., Web, DBMS) to meet the SLA requirements at all times. Moreover, we measured the accuracy of performance models in Orchestra framework, and the results often show less than 10% errors in estimating the performance of cloud applications.

In addition to the mechanisms that solve two main uncertainties in the public clouds, we present a new cloud simulator -- PICS -- that supports large-scale performance evaluation of cloud applications and resource management systems in a short amount of time. PICS enables the cloud end-users to evaluate the cost and performance of public IaaS clouds along with such dimensions like VM and storage service, resource scaling, job scheduling, and diverse workload patterns. We extensively validated PICS by comparing its results with the data acquired from real public IaaS cloud using real cloud-applications. We show that PICS provides highly accurate simulation results (less than 5% average simulation errors) under a variety of use cases. Furthermore, we evaluated PICS’ sensitivity with imprecise simulation parameters. The results show that PICS still provides very reliable simulation results with inaccurate simulation parameters and performance uncertainty.

PHD (Doctor of Philosophy)
Cloud Computing, IaaS Cloud, Resource and Application Management, Workload Prediction, Application Performance Model, Resource Control, Cloud Simulation, Predictive Scheduling
All rights reserved (no additional license for public reuse)
Issued Date: