Novel Statistical and Systems Engineering Approaches Towards Gene Network Inference

Author:
Muthiah, Annamalai, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Advisors:
Lee, Jae, Md-Pbhs Public Health Sciences Admin, University of Virginia
Patek, Stephen, Department of Systems and Information Engineering, University of Virginia
Keller, Susanna, Department of Medicine, Endocrinology and Metabolism, University of Virginia
Learmonth, Gerard, Frank Batten School of Leadership & Public Policy, University of Virginia
Garcia, Alfredo, University Center, School of Cont and Prof Studies, University of Virginia
Abstract:

Dynamical networks such as Gene Regulatory Networks (GRNs) networks spanning the entire genome of animals and humans (network consisting of ~ 20,000 to 40,000 genes) are complex systems that possess the property of emergence and self-organization. The macroscopic property of GRNs, that is, activities of genes in the network, was simulated over time by constructing a random Boolean network and the resulting emergent properties of the network showed the dynamical network to behave stably, exhibit homeostasis, show graceful minor modification when mutated and also networks capable of complex behaviors (Kauffman, 1995). These macroscopic properties of GRN were due to the ability of complex systems such as GRN to spontaneously and freely organize themselves into an orderly dynamical network (concept known as “order for free”) due to their property to self-organize.

On the other hand, large volumes of gene expression data are being generated from biological experiments and there is a great need to reverse engineer the GRN generating the data. In other words, inferring the microscopic gene interactions of the dynamical GRN from the network’s macroscopic data, that is, gene expression data produced from it (the process of “network inference”). Microarrays provide a comprehensive snapshot of gene expression of the entire genome (>20,000 genes) from which the network can then be inferred. Current machine learning techniques for analyzing and constructing network from microarray data adopt a genome wide approach (‘global’ approach) to network inference and have a higher tendency of producing a large number of gene interactions in the network that are false positives. Instead of performing network inference among all the genes simultaneously (‘globally’), the process can be simplified if a ‘local’ search approach is taken to network inference from gene expression data.

High-throughput gene activity data such as microarrays could either be collected as static snapshots of gene activities under different biochemical conditions or time series data of gene activities after perturbation. While inferring networks from static gene expression data, contemporary network inference algorithms performed ‘global’ network inference, underutilizing prior Biological information available about the network. Therefore, by Anchoring network construction around prior network Knowledge, the problem of network inference from large volume static data can be reduced to a ‘local’ network Expansion problem. This principle
formed the basis of one of the novel network inference approaches I proposed and validated in my current study, Biologically Anchored Knowledge Expansion (BAKE). In particular, I applied the BAKE approach to infer gene regulatory networks from gene activity data obtained from fat cells isolated from insulin resistant mice by anchoring the construction to insulin signaling pathway genes, and thus knowledge already available about the network from the literature. Thereby, novel genes among gene activity data were organized as strong clusters around known insulin signaling pathway genes. An important advantage of the BAKE approach is that it dramatically reduces discovery of false positives in the network by discovering only those novel gene interactions tightly linked to prior knowledge of the network. When gene networks were constructed around insulin signaling pathway genes using BAKE, I discovered a novel gene, Krueppel-like factor 4 (KLF4) in the network around two key insulin signaling genes, IRS2 and TSC2, and subsequently validated interactions between the genes by additional animal experiments. I also validated the network inference ability of the BAKE approach by an in-silico (computational) experiment in which I tested whether the BAKE approach could reconstruct hidden portions of an adipogenesis network. By using a partial version of the network as prior knowledge I estimated BAKE’s precision to infer edges in the network to be 44%, comparable in performance to other network inference algorithms.

While inferring networks from time series gene expression data, I also took a local approach to Network Inference by Anchoring the network around building blocks, the Motifs. Thereby, each motif represented a subunit of the network and was made up of three genes, and regulatory relationships between genes within each motif were inferred and the network built through the motifs. This principle formed the basis of my other novel network inference approach, Motif Anchored Network Inference (MANI). I implemented the MANI approach on time series data obtained by perturbing a 7 gene network, part of adipogenesis cascade, and validated its ability to reconstruct a small size (n=10) in-silico network made available by Dialogue on Reverse Engineering Assessment and Methods (DREAM) consortium. The precision of network inference by the MANI approach was 40%, comparable to other contemporary network inference algorithms. However, the ability of the MANI approach to generate “dynamical” features of the constructed network, such as hierarchical relationships between network genes, time sensitive activation of the network cascade, and easily interpretable network construction due to strong focus on underlying mechanisms of network regulation, distinguished it from other contemporary algorithms.

I expect these novel network inference approaches using a local as opposed to a global search approach to network inference to have wide applications in the field of gene network construction from large genomics data.

Degree:
PHD (Doctor of Philosophy)
Keywords:
Gene Regulatory Networks, Systems Engineering, High Through-put Data, BAKE, MANI
Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2016/04/21