Abstract
The STS Research Paper and Capstone Project, although both being software related,
looked at two different areas of Computer Science and Software Engineering. The STS Research
Paper looked at the societal and technological implications of businesses shifting the
responsibility of managing their compute resources to cloud providers like AWS or Microsoft
Azure. The Capstone Project was about extending an existing benchmark suite (developed at
UVA) that is used for evaluating parallel computing accelerators. The work updated the
benchmark suite to add direct support for running on AMD GPUs using APIs developed by
AMD, replacing the previous approach using OpenCL and opening the door to better
performance, as well as giving the opportunity to evaluate the tools developed by AMD for
writing code for their hardware.
The STS Research Paper looks specifically at how the growth of cloud providers has led
to a world where software systems are more brittle, and failures impact many services and
companies rather than being isolated to a single system (which is one of the main advantages of
the distributed nature of the internet). The main technique used for the analysis was
Actor-Network Theory, which allows us to describe the various layers between end-users, the
companies that sell software services, and cloud providers, and how those layers of interaction
result in different expectations and incentives for different actors in the network. The paper looks
at cases such as major cloud provider outages, and software failures caused by manufacturing
defects in hardware, to analyze how responsibility for fixing issues was allocated, as well as how
the public and media reacted to the failures. The paper uses these case studies and
Actor-Network theory to argue that the current system essentially incentivizes software
companies putting all their eggs in a few baskets (the cloud providers).
The Capstone Project focused on the specific task of extending the Rodinia benchmark
suite to better support AMD hardware. Rodinia’s benchmarks were implemented in CUDA,
meant for Nvidia hardware, and OpenCL for other hardware, but because it is designed to be
generic, OpenCL code tends to be slower than specialized code. AMD has developed a set of
tools to assist developers in porting CUDA to their equivalent of ROCm, since they want to
position themselves as an alternative to Nvidia, which has become especially important in recent
times, where Nvidia hardware is the default choice for machine learning workloads, and the
demand for their GPUs outstrips supply. So the main result of the project was to evaluate how
effective the tools AMD built are. During the project, 10 benchmarks were successfully ported to
run on AMD hardware. For the majority of the benchmarks, AMD tooling was effective for
porting, requiring only minimal additional manual work. However, some workloads required a
moderate amount of additional effort to run on AMD hardware, since the CUDA
implementations were more specialized to run on Nvidia GPUs. The performance of the AMD
versions also lagged behind the original Nvidia implementations. So, overall, AMD’s tooling
works for getting code designed to run on Nvidia hardware to work on AMD, but getting the
code to run at full performance on AMD requires additional investment, as well as the
maintenance cost of then maintaining two separate implementations.
Although the STS research paper and capstone topic were quite different, they both look
at issues relating to vendor lock-in versus building systems that are meant to run anywhere. The
research paper focuses on the negative costs of software companies depending on a single cloud
provider for their compute infrastructure. On the other hand, the capstone project provided
nuance to the issues of depending on a single vendor by showing through a practical example that maintaining a system
that isn’t tied to one vendor adds significant maintenance cost to a software project.
Overall, working on these two projects gave me insight into building software and the tradeoffs that engineers need to make.