Distributed Data Learning with Knowledge Transmission Topology

Author:
Li, Yifan, Statistics - Graduate School of Arts and Sciences, University of Virginia
Advisor:
Tang, Xiwei, AS-Statistics (STAT), University of Virginia
Abstract:

In today's digital era, vast amounts of data, such as hospital health records and individual device usage data, are stored in diverse locations. These distributed datasets, while essential for preserving individual privacy and managing data sizes, present unique challenges for comprehensive data analysis under the constraints arising from data sharing and aggregation. In this thesis, we investigate statistical modeling in a distributed data system along with some information transmission structures. In Chapter 2, we study a penalization-based model integration problem with a network constraint. We propose a network sparsification method that significantly reduces communication across data sites. This method is computationally more efficient while preserving estimation efficiency. In Chapter 3, we develop a Decentralized Federated Learning framework without sharing or aggregating data. We explore different knowledge-sharing mechanisms between sites, with the goal of building predictive models for each individual site without a central server. At the same time, we examine how different transmission topologies affect the efficiency of communication.

Degree:
PHD (Doctor of Philosophy)
Keywords:
Distributed Data Learning, Statistical Machine Learning, Graph Sparsification, Decentralized Federated Learning, Fused LASSO
Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2024/04/30