Improving Reliability and Security with Aging and Pre-RTL Modeling

Author: ORCID icon
Roelke, Alec, Computer Engineering - School of Engineering and Applied Science, University of Virginia
Stan, Mircea, Department of Electrical and Computer Engineering, University of Virginia

With the increasing importance of cloud computing, where low-power devices offload power-hungry computations to remote servers, the reliability of these servers becomes more important. At the same time, the emergence of the Internet of Things (IoT) has introduced a need for long-lasting electronics in devices with long lifetimes. Both types of systems are susceptible to aging: the slow degradation of circuit parameters that eventually leads to failure. As a result, architects need tools to evaluate the effectiveness of techniques for improving reliability, but post-RTL simulation is slow. In this work, I present a pre-RTL tool called OldSpot which enables optimization of aging resilience using high-level models that improve simulation speed by reducing unnecessary detail while decreasing accuracy loss. Existing aging models make assumptions about aging rates that do not hold in a system whose operational parameters change over time. OldSpot uses directed graphs to indicate how the failures of units within the system contribute to the failure of the whole to create a lifetime distribution, removing these assumptions. This enables analysis of architectural techniques like structural duplication to improve lifetime.

OldSpot can be included in a pre-RTL tool flow that includes power, performance, and temperature simulations to create a high-level characterization of all design metrics. To enable its use in this flow, I also present an implementation of the RISC-V ISA in the gem5 simulator, a high-level microarchitecture and memory modeling tool widely used for pre-RTL performance simulation. The flow is demonstrated by simulating a heterogeneous system containing a RISC-V CPU and an accelerator to show the importance of co-designing the two units rather than designing them separately.

Another limitation on the lifetime of IoT devices is their security, which can be ensured using a compact, low-power device called a Physical Unclonable Function (PUF). PUFs use natural silicon variations to create fingerprints. Despite PUFs' power and area advantages, they are also susceptible to aging, which affects variations and modifies their fingerprints. In this work, I show how directed aging can degrade the reliability of a PUF or even duplicate its fingerprint. I also demonstrate a method of resisting this degradation using active recovery.

PHD (Doctor of Philosophy)
Computer Engineering, Transistor Aging, Reliability, Modeling, PUF
All rights reserved (no additional license for public reuse)
Issued Date: