MTECH PROJECTS
Embedded Tutorial ET1: Better-than-Worst-Case Timing Designs Achieving high performance within stringent power budgets is emerging to be one of the most difficult challenges in the design of current generation digital systems. In synchronous systems, switching signals are typically allowed a fixed amount of time to settle within each clock cycle, with the clock period appropriately selected to accommodate the worst-case switching delay. Some additional timing margin, typically 10-20% of the clock period, is allowed beyond the nominal critical path delays to accommodate timing uncertainties introduced by process, voltage and temperature (PVT) variations; these appear to be increasing significantly in highly scaled CMOS technologies. Unfortunately, despite the lack of switching activity, the circuit continues to consume significant static power during these timing margins, which consequently result in unwanted loss of both power and performance. Furthermore, since worst case signal paths in CMOS are highly input dependent and generally not activated in every clock cycle, this wasteful window of circuit inactivity in a typical cycle is often longer than just the timing margin. This is particularly true for circuits with a wide distribution of path delays, where the few long paths are infrequently activated; the computation completes with signals stabilizing quite early in most clock cycles. Clearly, significantly higher computational throughput and power efficiency could be achieved if the resulting window of circuit inactivity during the remainder of the clock cycles could be eliminated or even minimized. Asynchronous and data flow designs and architectures have long tried to exploit this statistical variability in delays in circuit functional blocks by building in a capability for signaling the completion of each operation. This can potentially allow execution to proceed as soon as a functional result is available, instead of waiting out the worst case delay for each functional block. An early and classic example is carry com- letion signaling in ripple carry adders which provides an indication as soon as the carry signals have stabilized and the result is valid, following application of each new set of inputs. Unfortunately, the efficient design of fully asynchronous and data flow systems has proved extremely challenging. Consequently, elements of asynchronous operation have sometimes been incorporated into traditional clock based designs using some form of a handshaking control protocol. Typically such designs dynamically allow functional units a varying number of system clock periods to complete their operation, thereby avoiding worst case delays in every instance. The mechanisms employed to ensure that a functional block gets sufficient time to correctly complete its operation broadly take three forms. (1) Completion signaling, where the function is designed with redundant outputs (or output coding) which directly indicates when the result is valid. (2) Input based timing prediction, where (a subset of) the inputs are decoded to quickly determine if for those inputs the circuit will need one, two or more cycles. And (3) error detection based recovery, where error detection circuits check the results at the end of every clock cycle and initiate a recovery, requiring additional cycles, in case of an error caused by the aggressive clock timing. In this presentation we discuss a number of better-than-worst-case design approaches that have been proposed in the literature. We not only focus on the various low cost error detection and recovery techniques that have been proposed, but also address other major challenges in implementing such designs. Key among them is addressing potential flip-flop meta-stability which can occur if the flip-flop inputs are allowed to arrive at arbitrary times relative to the clock signal, as is the case in such overclocked designs. Another challenge associated with the commonly used flip-flop duplication based timing error detection approach are false error in