Advancing the Reproducibility of Hydrologic Models: An Extensible Model Metadata Framework and Cyberinfrastructure for Complex Modeling

Maghami, Iman, Civil Engineering - School of Engineering and Applied Science, University of Virginia
Goodall, Jon, EN-Engr Sys & Environment, University of Virginia

A variety of hydrologic computational models are used by domain scientists to address specific water challenges such as floods, droughts, and water pollution. The number of scientific studies making use of such diverse and often complex models has been rapidly increasing. Ideally, the investment in models and approaches used in these past studies could be reproduced, replicated, and leveraged in future studies. Many researchers in recent years have emphasized reproducibility as an essential part of scientific research. However, the reproducibility of model-based scientific studies is still lacking in large part due to the proliferation and complexity of hydrologic models. In this research, the components of these models are defined as two distinct concepts: the modeling software referred to as a model program and the model input and output files referred to as a model instance. The overall goal of the research reported in this dissertation is to foster the reproducibility of model-based hydrologic studies through (i) creating a metadata framework to describe model programs and model instances in a more extensible way and (ii) designing cyberinfrastructure to overcome challenges in reproducing large-scale computational studies. The first part of this research goal leads to the first two studies in this dissertation. First, an extensible hydrologic model metadata framework design using a user-defined metadata schema is presented. To overcome the fact that the properties of a model instance needed for a single hydrologic model program can vary significantly, I present a standardized way for encoding an extensible machine-readable model metadata schema that specifies the metadata elements for the model instances of any model program. Second, I present a methodology for building a model instance metadata schema for model programs and discuss strategies for how to manage the variety of metadata schemas that might be developed by the modeling community. The third study focuses on the second part of the overall goal of this dissertation research and aims to advance the cyberinfrastructure needed to overcome data management and software architecture challenges to reproduce large-scale computational studies. The goal is to make computational hydrologic studies easier to reproduce and reuse. Across the three studies, this research work advances reproducibility by demonstrating how model agnostic metadata frameworks can advance model sharing and reuse, how it is possible to create a generic model metadata framework for hydrologic models, how containerization of the model allows for portability across computing environments, how Globus can be used for large data transfer between scientific cloud services, and how Jupyter can provide a gateway to HPC environments. Results of this research provide scientists and engineers new ways to conduct computing and data-intensive modeling studies using modern cyberinfrastructure that fosters reproducibility. These results not only impact the hydrologic modeling field, but can be broadly adopted by environmental and other scientific disciplines that follow a similar practice of making use of model program and model instance objects in their research.

PHD (Doctor of Philosophy)
Hydrologic modeling, Standard metadata schema, Model metadata, Metadata schema, Reproducibility , Open hydrology, Containerization, HPC, Jupyter, HydroShare
Sponsoring Agency:
National Science Foundation under collaborative grants 1664061, 1664119 and 1664018 for the development of HydroShare (
Issued Date: