Open-Source Speech-To-Text Model Comparison for Game Applications; Analysis on the Role of Machine Learning in Enhancing Online Foreign Language Acquisition

Author:
Pham, Jason, School of Engineering and Applied Science, University of Virginia
Advisor:
Seabrook, Bryn, EN-Engineering and Society, University of Virginia
Abstract:

Introduction
Although my Capstone project and STS research project differ in fields of interest, being gaming technology and language learning respectively, both are brought together by the central idea in how machine learning (ML) can play a role in improving user experience. The capstone project delves into benchmarking the best open source speech-to-text (STT) models for the purpose of developing an in-game overlay that users can make queries about game facts with seamless access through voice activation. My STS research on the other hand examined how ML tools can impact the setting of online language learning and if it improves or hinders the user’s experience compared to the traditional in-person learning environment. Although the relationship between these two use cases are different, they both lie in the broader question of how ML technologies can be integrated into society to enhance the already current user experiences.

Capstone Summary
Video games have an abundance of online resources to assist players understand the game, but it can be very difficult to organize and retrieve desired information quickly. To create an application that provides seamless voice- activated in-game lookups, my team and I made benchmarks on multiple open-source speech-to- text models to find the most efficient model in the context of game speech. We evaluated the models on word-error-rate, character-error-rate, precision, recall, accuracy, and time spent. Benchmarks were evaluated on the LJSpeech dataset and a custom dataset where readings from a general, technical, and game text were recorded. The results show that OpenAI’s WhisperAI performed best on all the datasets in terms of transcription accuracy with relatively fast transcription time. Having decided on an effective speech-to-text model, the next step to create the voice activated overlay application is to select a target game with an easily accessible online databank via an API call to make voice- activated queries.

STS Research Summary
As reported by large online language learning platform Babbel, there is a high failure rate of 70% to 90% amongst online language learners. Seeing the rapid advancement of machine learning (ML) technology, this research addresses this high failure rate by delving into the impact ML is having on the field. The main research question focuses on: How has the development of machine learning technology impacted online foreign language acquisition? Utilizing Bruno Latour's actor-network theory (ANT) as the framework, this study analyzes the complex interactions between human learners and nonhuman ML technologies within sociotechnical systems. Through documentary research and network analysis of academic literature, patterns are anticipated that will highlight the current position ML technology has in enhancing user engagement and improving learning outcomes. This research is significant to the fields of Science and Technology Studies (STS) and Engineering, as it expands the understanding of how emerging ML technologies can reshape educational practices. By focusing on the scope of ML and language learning, this study contributes to the ongoing discussion about the effectiveness of digital learning environments and the necessity of human intervention in technology-driven education.

Conclusion
Working on both of these projects showed unique insights to how ML can be implemented in society across various domains, gaming and language education. While the capstone project was a lot more technical consisting of model benchmarking and evaluation, the STS project made me realize the social implications ML technology has on society and how it can create a shift for better and worse. One main key take away I had gained from working on these two projects simultaneously that I would not have gained otherwise, is that as an engineer working with these new innovative technologies, not only is it important to study up on how to technically create them, it is equally, or perhaps more critical, to think about the implications it has on the current social system. The engineers must think about impacts created by the introduction of these technologies on current systems and if it will hinder or enhance it.

Degree:
BS (Bachelor of Science)
Keywords:
Machine Learning, Online Language Learning, Actor-Network Theory, Educational Technology, Large Langauge Model
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Rosanne Vrugtman

STS Advisor: Bryn Seabrook

Technical Team Members: Aj Nye, Alex Yung

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2025/04/30