LLM-Augmented Data Automation: Structuring Regulatory Filings for Pipeline Intelligence; Digital Demand: The Macroeconomic and Equity Implications of AI-Driven Electricity Consumption

Le, Jackson

LLM-Augmented Data Automation: Structuring Regulatory Filings for Pipeline Intelligence; Digital Demand: The Macroeconomic and Equity Implications of AI-Driven Electricity Consumption 112 views

Author

Le, Jackson, School of Engineering and Applied Science, University of Virginia

Advisors

Ripley, Karina , EN-Engineering and Society , University of Virginia
Vrugtman, Rosanne , EN-Comp Science Dept , University of Virginia
Morrison, Briana , EN-Comp Science Dept , University of Virginia

Abstract

Sociotechnical Synthesis 
Capstone Research:
My Capstone project, completed during my internship at Arbo, focused on building an LLM-augmented automation pipeline to process unstructured metadata from regulatory filings, with a scope limited to files submitted to the Federal Energy Regulatory Commission (FERC). These files contain information about natural gas and LNG infrastructure projects, like milestones, authorization types, request dates, and in-service timelines, but their unstructured and inconsistent formatting made manual curation slow and resource-intensive for the company’s analysts. 
To address this, I designed a multi-layered system using Python, Django, and OpenAI’s API. The pipeline preprocesses documents by extracting existing metadata from Arbo’s database, then uses an LLM-based routing layer to classify each filing by type. A cleaning layer chunks lengthy documents and summarizes them to manage token limits and reduce the risk of hallucination, and a processing layer extracts the targeted metadata into structured JSON. Finally, a post-processing layer stores the result in Arbo’s database. Main challenges of this system include inconsistent terminology across filings, unreliable LLM outputs, and managing document lengths. I dealt with these problems by utilizing context engineering, document chunking, and modular workflow design. In testing, extraction time dropped from around fifteen minutes per filing to under just two minutes, with only minor errors across eighty documents. The project laid the groundwork for Arbo to expand LLM-based automation to additional regulatory document classes and eventually toward multi-agent systems for real-time pipeline intelligence. 
STS Research:
My STS paper, which was inspired by the energy domain I worked in during my capstone, examines how the rapid expansion of AI and cloud computing is changing electricity demand in the United States and what the consequences are for pricing and economic equity. Using mostly the Social Construction of Technology (SCOT) framework, I analyze how different relevant social groups, like large tech companies, utilities, regulators, and everyday ratepayers, see and interpret and shape data center expansion in different ways. Tech companies see it as a business opportunity, utilities see it as a way to expand their rate base and increase shareholder returns, and regulators are still working on a way to adapt existing frameworks to unprecedented demand. Meanwhile, low income household face difficulty paying bills with electricity costs rising, mainly driven by infrastructure costs they did not create. 
	Central to my argument is the disintegration of the cost causation principle, which said that infrastructure costs should be distributed to the customers who generate them. Drawing on utility rate case filings from Virginia, Pennsylvania, and Wisconsin, as well as industry projections and federal reports, I show how utilities are socializing data center infrastructure costs across general ratepayers while sometimes offering confidential, discounted rates to tech firms. SCOT’s concept of interpretive flexibility helps explain why the same expansion is framed as economic progress by utilities or tech companies but as a cost burden by normal people. The concept of closure is also important since ongoing regulatory proceedings in various states show that there is no consensus state that has been reached on how these costs should be distributed. My research argues that without stronger transparency requirements and rate structures that hold tech companies accountable, the current trajectory will widen economic inequality as the public continues to absorb the costs of technological growth. 

Degree

BS (Bachelor of Science)

Keywords

Artificial Intelligence; Electricity; Large Language Model; Utility; Cloud Computing

Notes

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Briana Morrison, Rosanne Vrugtman

STS Advisor: Karina Ripley

Language

English

Rights

Attribution 4.0 International (CC BY)

Issued Date

2026-05-05

Persistent Link

https://doi.org/10.18130/pr02-x570

Suggested Citation

Le, Jackson. LLM-Augmented Data Automation: Structuring Regulatory Filings for Pipeline Intelligence; Digital Demand: The Macroeconomic and Equity Implications of AI-Driven Electricity Consumption. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-05, https://doi.org/10.18130/pr02-x570.

Files

Le_Jackson_Prospectus.pdf

Downloads: 37

Download

Le_Jackson_STSResearchPaper.pdf

Downloads: 30

Download

Le_Jackson_SociotechnicalSynthesis.pdf

Downloads: 16

Download

Le_Jackson_TechnicalReport.pdf

Downloads: 38

Download