Uncovering Signatures of Viral Sequences by Interpreting CNN Models using Integrated Gradients; Between Code and Cell: Collaborations and Conflicts over AI in Biological Research

Bai, Yili

Uncovering Signatures of Viral Sequences by Interpreting CNN Models using Integrated Gradients; Between Code and Cell: Collaborations and Conflicts over AI in Biological Research 85 views

Author

Bai, Yili, School of Engineering and Applied Science, University of Virginia

Advisors

Norton, Peter , EN-Engineering and Society , University of Virginia
Warren, Andrew , PV-BII-Biocomplexity Initiative , University of Virginia
Vullikanti, Anil , PV-BII-Biocomplexity Initiative , University of Virginia

Abstract

Artificial intelligence has become a defining force in both engineering and scientific discovery, transforming how knowledge is produced and verified. As artificial intelligence (AI) systems proliferate in biology and public health, how can researchers ensure that models which excel at prediction also meet high standards of transparency, credibility, and ethical responsibility? 

One domain where these concerns become especially tangible is genomics. Deep learning models now play a central role in identifying viral sequences within large genomic datasets, yet their decision making processes remain largely opaque. To address this challenge, this project reconstructs a scalable data pipeline for metagenomic processing and introduces an Integrated Gradients based framework designed to interpret Plinko, a convolutional neural network (CNN) trained for viral classification. Through this system we traced the model’s predictions back to specific nucleotide and amino acid k-mer features and found that Plinko relies on broad genome wide compositional biases that are characteristic of viral and host genomes. Approximately 70% of the k-mer vocabulary across both sequence types contributes significant discriminative signals, reflecting well established evolutionary and structural constraints such as CpG suppression in viral genomes and the hydrophobic composition of viral capsid proteins. By revealing these underlying patterns, the project offers a biologically grounded explanation for Plinko’s reasoning and establishes a generalizable approach for interpreting deep learning models in sequence classification tasks.

As AI proliferates in experimental biology, researchers, AI developers, and funding institutions competitively negotiate credibility and authority. Biological researchers preserve authority through validation, documentation, and empirical testing, while AI enterprises emphasize efficiency, automation, and scale. Drawing on institutional reports, practitioner forums, and case studies such as DeepMind’s AlphaFold and the European Molecular Biology Laboratory’s curation of predictive databases, the analysis shows that credibility in AI-enabled biology depends on shared practices that unite computational prediction with experimental verification. It concludes that the future of AI in scientific research depends not only on developing dual literacy, in which scientists learn to interpret algorithms and engineers learn to understand experimental rigor, but also on sustained collaboration between these two communities of expertise. Such collaboration ensures that the interpretive values of biology and the innovative capacities of computer science reinforce one another, creating systems that are both powerful and trustworthy in advancing scientific understanding.

Degree

BS (Bachelor of Science)

Keywords

Explainable AI; Computational Biology; Bioinformatics

Notes

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Andrew Warren, Anil Vullikanti

STS Advisor: Peter Norton

Language

English

Rights

Issued Date

2025-12-11

Persistent Link

https://doi.org/10.18130/hpve-vh37

Suggested Citation

Bai, Yili. Uncovering Signatures of Viral Sequences by Interpreting CNN Models using Integrated Gradients; Between Code and Cell: Collaborations and Conflicts over AI in Biological Research. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2025-12-11, https://doi.org/10.18130/hpve-vh37.

Files

This item is restricted to UVA until 2030-12-11.