Abstract
Today, data has become the next primary resource of humankind, especially in healthcare and public health.
Graphs or networks are commonly used data abstractions in many diverse applications, including social networks, advertising, and public health.
There is a significant risk of sensitive information (e.g., disease, condition, and individual locations) being leaked through such applications.
To mitigate the risk, companies and governments are using Differential Privacy to safeguard their sensitive data.
Two of the challenges with applying Differential Privacy are many problems lack accurate private algorithms, and existing private algorithms often struggle with scalability.
To address these challenges, this work focuses on developing privacy-preserving algorithms to solve fundamental problems arising in network science under two research threads: (1) theoretical foundations for community detection in graphs with privacy and (2) development of fast and scalable private algorithms for massive networks.
Different notions of community detection have been studied in graph mining.
One of the most popular methods involves identifying the most densely connected subgraph, and very efficient non-private algorithms are known for this task.
In the first thread, this work led to the first sequential and parallel densest subgraph algorithms designed under edge differential privacy, as well as a lower bound that specifies the minimum amount of utility trade-off required to guarantee privacy.
The parallel version, while matching the sequential's asymptotic utility, is several orders of magnitude faster, and is practical for analyzing large networks.
Another popular topic in this area is the community detection problem in Stochastic Block Models (SBM), which is one of the few settings where rigorous information-theoretic results are known for the recoverability of the ground-truth communities.
The SBM serves as a test bench for community detection algorithms, addressing the classical Information Theory and Network Science problem of determining the limits of such algorithms on random graphs.
Though this topic has been extensively studied, solving it with differential privacy (DP) constraints remains an open question.
In the first part of this problem, we explored multiple approaches to figure out the first set of conditions for recoverability, with different trade-offs among them.
Notably, stability analysis, one of the approaches, can be integrated with different non-private community detection algorithms to achieve exact recovery with different trade-offs, demonstrating its potential as a promising framework for private mechanism design.
In the second part, we extended the stability analysis to achieve exact recovery for more realistic variants of SBM (general SBMs, including those with unbalanced communities and edge labels).
We also improved the analysis to establish a polynomial time algorithm with a recovery bound that matches the non-private setting's for the binary symmetric variant of the Stochastic Block Models (SBMs) when the privacy loss tends to infinity.
In the second thread, our focus is on improving the efficiency and scalability of graph statistics calculations, such as counting small subgraphs in massive networks.
Methods based on global sensitivity have poor accuracy, and alternatives to global sensitivity, such as smooth sensitivity and ladder function, have been proposed.
While they improve the accuracy of private queries, computing them is computationally challenging for most statistics.
As a result, using these methods for massive networks becomes impractical.
To address this challenge, we introduced the concept of approximate smooth sensitivity and its framework for efficient private queries, demonstrating its effectiveness by speeding up the private triangle count calculations by orders of magnitude while maintaining comparable accuracy to other methods.
We then attempted to solve another classic graph statistic, the count of $k$-cliques, where the triangle counting problem is a special case.
Unfortunately, $k$-clique's smooth sensitivity is intractable.
We transformed the $k$-clique's ladder function (which had been previously known) into a smooth upper bound on its local sensitivity, allowing us to extend the framework to approximate quickly.