Operations Engineering: How Monitoring Maintains Uptime; The Growing Demand for Green Cloud Computing

Author:
Jiang, Peter, School of Engineering and Applied Science, University of Virginia
Advisors:
Francisco, Pedro Augusto, EN-Engineering and Society, University of Virginia
Morrison, Briana, EN-Comp Science Dept, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia
Abstract:

The amount of data collected and stored by large websites has exponentially grown since the dawn of the internet, yet the inner workings are often treated as a black box, with the inner workings obfuscated. As part of my technical research, I examined the framework of Operations Engineering to maintain high availability - the target threshold for application uptime in a year, usually above 99.9% uptime. During my internship over the summer, I explored the idea of Operations Engineering and noted its importance in maintaining uptime. Data storage and processing also plays a large part in the infrastructure of modern internet systems, especially through cloud computing services. Since cloud computing consumes a significant amount of energy to provide services, and has exploded in usage in recent years, the importance of finding more energy-efficient technologies to support the demand for cloud resources. Both of these technologies enable the current infrastructure and daily operations of many large internet applications. With the amount of internet users globally, the importance of these technologies has also skyrocketed, making them important topics to understand and research.

The value of Operations Engineering is largely dependent on the scale of the relevant application. For instance, an application serving a million users will value maintaining uptime significantly more than an application with a thousand users. Having unexpected downtime can translate to millions of dollars’ worth of losses. Therefore, my project contributed to maintaining uptime as much as possible by creating dashboards - a streamlined location to display errors, requests, and even application health. To do so, I used a query language known as Splunk to search through the data and create visualizations. These visualizations would show information relevant to the given application, and would significantly reduce developer debugging time.

During my project, I built up and completed unique dashboards for multiple applications within the company. These dashboards enable rapid responses to any potential incidents, during which an error must be traced and resolved. Without these dashboards, the developers of their applications would be forced to manually query and sort through the logs, which would require valuable time during downtime. Once the dashboard project was completed, the Operations Engineering team would work to improve workflow in other areas, such as resolving incidents, building up new pipelines or creating new dashboards. Through these dashboard projects, I gained practical insight into downtime management and resolution strategies.

The other half of my research focuses on the environmental effects of cloud computing, a field known as green cloud computing. Cloud computing is a modern service that offers Information Technology (IT) resources over the internet, reducing the setup and costs that users need to perform. However, cloud computing has become so widespread that the energy consumption of these services globally has grown to 1% of global energy consumption annually. Therefore, I sought to research how the demand for internet resources impacts the growth of green cloud computing. As part of my analysis, I evaluated several case studies and performed literature review on green cloud computing to determine the social, economic, and environmental factors defining the current state of cloud computing. This was further analyzed through the Social Construction of Technology (SCOT) framework, as cloud computing is a technology that is shaped by the demands of the end users.

Throughout my research, I encountered several pieces of literature describing various aspects of green cloud computing. These articles included information such as methods of incorporating greener technologies within cloud data centers, the future of work within data centers, the driving force of green innovations and products, as well as data points recording the general demand for data over time. By analyzing these sources, I found that the growth of cloud computing has followed a similar trend to the growth of the internet, which in turn has caused greater demand for greener technologies within cloud computing. Additionally, because cloud computing providers are driven mainly by profit, it is essential for customers to demand green services, which would cause the much-needed competition for greener services. One possibility introduced by a case study into the greenness of several data centers is reusing waste heat into district heating. In a similar vein, government regulation would also aid in patching holes in current emissions reporting as well as mitigate the environmental costs of older data centers. All this evidence and current trends suggest that cloud computing will likely evolve to become more sustainable in the near future, especially if consumers demand this change.

Degree:
BS (Bachelor of Science)
Keywords:
Cloud Computing, Operations Engineering, Green Technology, Data Center
Notes:

School of Engineering and Applied Science
Bachelor of Science in Computer Science
Technical Advisor: Briana Morrison
STS Advisor: Pedro Francisco
Technical Team Members: None

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2024/05/08