Functional Data Analysis for Sparse Functional Data

Author: ORCID icon
Zhang, Yin, Statistics - Graduate School of Arts and Sciences, University of Virginia
Zhou, Jianhui, Department of Statistics, University of Virginia

With the development of science and modern technology, more and more data are being collected continuously over a time interval in various disciplines, such as public health, biology, medicine and finance. Such data can be viewed as ``functional data". Functional data analysis (FDA), which deals with the analysis and theory of functional data, has been receiving increasing popularity over the past decades. In this dissertation, we propose several functional data analysis methods and apply them to NIH cohort study, which is a study in the field of growth modeling.

It is well known that early year catch-down growth is highly prevalent in developing countries for the reason of malnutrition (Black et al. [2008]). Children who suffers from malnutrition in the first 5 years of life will be at increasing risk for the development in cognitive and physical growth. Therefore, characterizing the catch-down growth and identifying the associate important risk factors is one of the most popular topics. In our study, we aim to investigate the relationship between height-for-age Z score (HAZ) at year 3 and a collection of predictors. However, we meet two problems. First, all functional predictors are sparsely and irregularly observed, that is, the measurement time varies from individual to individual. Functional predictors over the entire time interval must be estimated in order to perform the regression. In addition, some predictors, such as height, should be monotone over time, and a non-monotone estimation of height would make no sense. Secondly, the relationship between the response and functional predictors is not usually linear. Furthermore, here exists outliers in the response.

To address the first problem, we propose a new method based on a monotone transformation, functional principal component (FPC) analysis and a penalized regression to estimate monotone functions for sparse growth data. We also prove the asymptotic properties for this proposed estimator. Extensive numerical studies show that our proposed method outperforms the existing methods in terms of model fitting and monotonicity of the estimation. In addition, the proposed method can also be utilized as a data preprocessing procedure for other methods, such as functional clustering and classification, where the functional predictors are required to be completely known.

To address the second problem, we build a functional single index model for the non-linear relationship between response and functional predictors. The functional single index model is not only flexible but also interpretable. To deal with outliers, we propose a local modal regression (LMR) (Yao et al. [2012]) based estimation method. We show that by using the optimal bandwidth, the LMR estimator is not only robust when there are outliers or the error distribution is heavy tailed, but also asymptotically as efficient as the ordinary least squares based estimator when the error distribution is a Gaussian distribution. In addition, we conduct extensive simulation studies to demonstrate the robustness and efficiency of the resulting estimator by comparing it with least squares estimator and Huber estimator across different error distributions.

PHD (Doctor of Philosophy)
monotone function estimation, penalized regression, local modal regression
Issued Date: