Online Archive of University of Virginia Scholarship
OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer8 views
Author
Xu, Guangyi, Computer Science - School of Engineering and Applied Science, University of Virginia
Advisors
Cheng, Zezhou, EN-Comp Science Dept, University of Virginia
Abstract
Shot boundary detection (SBD) is a fundamental task in video understanding, with the goal of automatically identifying the transition points between different shots in a video. Traditional methods typically focus solely on predicting boundary locations, lacking the ability to model the semantic content within shots and the relationships between them, making it difficult to meet the requirements of downstream tasks such as video generation and video editing for structured video understanding. Additionally, existing datasets generally suffer from imprecise annotations for fade transitions and a lack of samples for complex transitions, limiting further improvements in model performance.
To address these issues, this thesis proposes a structured shot boundary detection method based on the Shot-Query Transformer. We redefine the traditional shot boundary detection task as a structured relationship prediction problem, predicting not only the scope of shots but also modeling both intra-shot and inter-shot relationships simultaneously. The model adopts a DETR-style query-based Transformer architecture, achieving end-to-end shot set prediction through learnable shot queries, while using frame-level classification to enhance boundary localization accuracy. Furthermore, this thesis designs a fully synthetic data construction workflow to programmatically generate large-scale, accurately annotated transition data, and establishes a new evaluation benchmark, OmniShotCutBench, for comprehensively assessing model performance.
Experimental results demonstrate that our method outperforms existing mainstream approaches in multiple aspects, including boundary detection accuracy, complex transition recognition, and structured relationship prediction, thereby validating the effectiveness of structured modeling and synthetic supervision. This research provides new insights into the evolution of shot boundary detection from simple localization tasks toward comprehensive video structure understanding.
Degree
MS (Master of Science)
Keywords
Shot Boundary Detection; Video Understanding; Structured Relational Prediction; Video Editing; Temporal Localization; Video Segmentation; Computer Vision
Language
English
Rights
All rights reserved by the author (no additional license for public reuse)
Xu, Guangyi. OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer. University of Virginia, Computer Science - School of Engineering and Applied Science, MS (Master of Science), 2026-04-24, https://doi.org/10.18130/hdj6-bp08.
Files
This item is restricted to abstract view only until 2026-10-06.