Polars: Using the Polars Library to Efficiently Process Large Datasets; Informate Don’t Automate: An Education-First Framework for AI Integration in Software Engineering

Sribar, Ryan

Polars: Using the Polars Library to Efficiently Process Large Datasets; Informate Don’t Automate: An Education-First Framework for AI Integration in Software Engineering 82 views

Author

Sribar, Ryan, School of Engineering and Applied Science, University of Virginia

Advisors

Earle, Joshua , EN-Engineering and Society , University of Virginia
Vrugtman, Rosanne , EN-Comp Science Dept , University of Virginia

Abstract

Software engineering as a profession is undergoing a significant transformation; modern tools available to engineers are changing what the work looks like, redefining expertise, and reshaping how software is written. Across this year, my technical and sociotechnical projects each examined a different facet of this shift: the former a hands-on engineering effort to modernize a legacy system, the latter a study of how the profession itself is being redefined by AI. Together, they reflect the recurring theme that tools reshape work, but the value of the engineer lies in deciding how.

My technical report, Polars: Using the Polars Library to Efficiently Process Large Datasets, describes a project I built during my summer internship at a financial services firm. A daily data pipeline was responsible for transferring tradable security information — identifiers, currency, country, and risk statistics for stocks, bonds, and mutual funds — from vendor tables into the company's internal database before each trading day began. The existing implementation, written in legacy SQL, took three to five hours to complete, had to be restarted from the beginning whenever it encountered a serious error, and was difficult to maintain because the original authors had left the company and the script's complexity was poorly documented. Because the data needed to be available before the market opened, these limitations posed a recurring operational risk to account reconciliation and trading. I replaced the script with a Python implementation built around the Polars dataframe library, chosen for its strong benchmark performance on the types of workloads the script required. Tables were loaded as LazyFrames so that Polars' query optimizer could plan joins, filters, and column selections efficiently before materializing results. Because the underlying PostgreSQL database system lacked a native MERGE function, upserts were implemented manually using INSERT statements with ON CONFLICT clauses through psycopg2. The script was deployed through the company's internal Kubernetes-based long-running-task framework, which exposed it as an API endpoint, scheduled it via cron, and provided automatic retries. The new pipeline reduced execution time from roughly three and a half hours to between three and six minutes while improving error recovery, readability, and maintainability. In retrospect, the project is a clear instance of what Shoshana Zuboff would call informating rather than automating, no one's role was eliminated, but the team gained a far deeper understanding of a previously opaque, business-critical process.

That distinction sits at the center of my STS research paper, Informate Don't Automate: An Education-First Framework for AI Integration in Software Engineering. The rapid integration of AI into software workflows has created widespread uncertainty about the future of the profession. While vendors and corporate leaders frame these tools as engines of productivity, critics warn of worker displacement, the deskilling of junior engineers, and the emergence of codebases so dependent on AI that they function as “black boxes” their human authors no longer fully understand. Drawing on Langdon Winner's argument that artifacts carry politics, Zuboff's distinction between automating and informating, and Actor-Network Theory as articulated by Sergio Sismondo, I treat AI assistants as active participants in the engineering workplace rather than neutral tools. The study combines Critical Discourse Analysis of industry surveys, corporate communications, and trade press with semi-structured interviews of engineering managers across several firms. Findings reveal that the profession is in the midst of a transition. Hiring is shifting toward whiteboard assessments and AI fluency, expertise is being redefined around judgment and verification rather than syntax, and headcounts have remained stable as productivity gains are absorbed by previously cost-prohibitive work. The most acute risk falls on early-career engineers, whose foundational skills are most easily commodified. The paper argues that the path forward is deliberate informating. I propose a "sandwich method" of human intent, AI generation, and rigorous human verification, treating all AI output as unverified until reviewed, so that engineers retain the deep understanding their organizations will eventually depend on.

Conducting the research for my STS paper helped me refine my perspective on my own experiences. Self-reported speedups, headcount stability, and the discipline of verification are not outcomes that tools produce on their own; rather, they are choices organizations make, and the Polars project embodied those same choices. Together, the two projects reinforced that technical and managerial decisions continually shape each other, and that the engineer's enduring value lies in the judgment that connects them.

Degree

BS (Bachelor of Science)

Keywords

AI-assisted software development; Vibe coding; Developer productivity; Polars Python library; Data pipeline optimization; Pandas vs Polars; Education-first AI integration; Software Engineering

Notes

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Rosanne Vrugtam (CS 4991)

STS Advisor: Joshua Earle

Technical Team Members: N/A (CS 4991)

Language

English

Rights

Issued Date

2026-05-13

Persistent Link

https://doi.org/10.18130/ajw7-dy38

Suggested Citation

Sribar, Ryan. Polars: Using the Polars Library to Efficiently Process Large Datasets; Informate Don’t Automate: An Education-First Framework for AI Integration in Software Engineering. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-13, https://doi.org/10.18130/ajw7-dy38.

Files

This item is restricted to UVA until 2031-05-13.