Abstract
My technical project and my STS research paper are connected through the shared problem of GPU benchmark modernization. However, while both address the question of evaluating contemporary GPU systems, they do it differently. My technical project investigates ways to improve GPU benchmarking by updating the benchmark design and content. It looks at how researchers can update benchmarking to better account for modern GPUs' workload requirements, runtime behavior, and architectural differences. My STS paper, on the other hand, tries to explain why researchers use benchmarks that they know to be problematic. Therefore, both papers deal with benchmark change. While one focuses on how benchmark change can be achieved, the other explains why benchmark change is often constrained.
My technical report proposes a systematic method for the development of GPU benchmark suite updates based on workload updates, algorithm improvement, changes to data set characteristics, and updated evaluation techniques that reflect current GPUs. In fact, legacy benchmarking suites, like Rodinia, rely upon older hardware and software, ignoring many architectural aspects of GPUs, like increased parallelism and a more complex memory hierarchy. My research addresses all of these deficiencies through changes to workload characteristics, modifications to algorithms in order to incorporate modern parallel computing considerations, and increased data set sizes that challenge GPUs. Additionally, the evaluation technique is modified to produce results that represent performance characteristics and differences at a more accurate level – namely, separating out kernel-level performance from overall performance and incorporating additional GPU performance metrics, such as cache behavior and occupancy.
My STS research paper discusses the persistence of benchmarking norms that become institutionalized over time. Drawing on Thomas P. Hughes’s theory of technological momentum, I argue that mature sociotechnical systems become difficult to redirect as infrastructure, routines, and expectations accumulate around them. To illustrate this point, I present Altis and Cactus – two benchmarking projects that aim to modernize GPU benchmarking yet are bound to legacy workloads, routines, and conventions. As I argue in my paper, legacy persistence is caused by three factors. Norms of comparability institutionalize older workloads as a standard of reference, simulator- and infrastructure-driven constraints ensure portability of the installed base of kernels, and evaluation pipelines reinforce the continuity through feedback loops.
Working on these two projects together made each one stronger. My technical project helped me understand why benchmark modernization is necessary and what specific limitations exist in older benchmark suites. At the same time, my STS research showed me that technical improvement alone is not enough to change evaluation practice. A benchmark suite also has to fit within an existing research community shaped by shared norms, tools, and expectations. This changed how I think about technical design. I now see benchmark modernization not only as a matter of creating better workloads, but also as a sociotechnical challenge involving adoption, legitimacy, and continuity. Together, these projects helped me see that successful engineering work depends not only on technical improvement, but also on understanding the systems of practice in which new designs must operate.