Reducing Network Latency for Web Applications in a Datacenter

Author: ORCID icon orcid.org/0000-0002-3604-4799
Wang, Haoyu, Computer Science - School of Engineering and Applied Science, University of Virginia
Advisor:
Shen, Haiying, EN-Comp Science Dept, University of Virginia
Abstract:

With the rapid development of web applications in datacenters, network latency becomes more important to user experience. The network latency will be greatly increased by incast congestion, in which a huge number of requests arrive at the front-end server simultaneously. Previous congestion problem solutions usually handle the data transmission between the data servers and the front-end server directly, and they are not sufficiently effective in proactively avoiding incast congestion. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a distributed controller to make congestion control decisions.
In this dissertation, we introduce a Swarm-based Incast Congestion Control (SICC) system and a Proactive Incast Congestion Control system (PICC) which focuses on in- cast congestion problem, and a Neighbor-aware Congestion Control algorithm based on Reinforcement Learning (NCC) for general congestion control. SICC forms all target data servers of one request in the same rack into a swarm. In each swarm, a data server (called hub) is selected to forward all data objects to the front-end server, so that the number of data servers concurrently connected to the front-end server is reduced, which avoids incast congestion. Also, the continuous data transmission from hubs to the front-end server facilitates the development of other strategies to further control incast congestion. To fully utilize the bandwidth, SICC uses a two-level data transmission speed control method to adjust the data transmission speeds of hubs. A query redirection method further reduces the request latency by balancing the transmission remaining times between hubs. In PICC, the front-end server gathers popular data objects (i.e., frequently requested data objects) into as few data servers as possible. It also re-allocates the data objects that are likely to be concurrently or sequentially requested (called correlated data objects) into the same server. As a result, PICC reduces the number of data servers concurrently connected to the front-end server, and the number of establishments of the connections between data servers and the front-end server, which avoids incast congestion and reduces the network latency. The large number of data transmissions between the data servers storing popular or correlated data objects and the front-end server may produce high queuing latency in the data servers. To reduce the queuing latency, PICC incorporates a queuing reduction algorithm that assigns higher transmission priorities to data objects with smaller sizes and longer queuing times. In NCC, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We conclude that these congestion control systems in a datacenter will help reduce network latency, avoid congestion and improve the quality of service of clients. This dissertation provides an overview of the scope of congestion control and network optimization within a datacenter, some of the key challenges in building congestion control systems, hypothesized contributions. The proposed systems achieve better congestion avoidance than several end-to-end and centralized mechanisms in prior work.

Degree:
PHD (Doctor of Philosophy)
Keywords:
Cloud computing, Datacenter networks, Congestion control
Language:
English
Issued Date:
2021/04/27