Whack a Mole is an old (pre-electronic) arcade game. On a table
surface there are a number of holes with moles hiding in them. A mole pops up
out of its hole and you hit it with a hammer, causing it to retreat. The moment
that mole returns to its hole, another random one pops up and you frantically
hit mole after mole on the head until they give up and the table is clear. Achieving
timing closure on complex chip designs bears a strong resemblance to this old
game, but is less fun. As you fix each critical path problem, new ones are uncovered,
or created.
Each new generation of FPGAs has given us higher performance and
higher capacity. Designs have become larger and more complex; containing many
clock domains, use of embedded multiply accumulation functions, embedded processors
and a variety of memory resources. These changes have help propel FPGAs into
many new applications. At the same time, predictability of timing in a Synthesis/Place
and Route flow has degraded with each generation. The split of path delay between
predictable cell delay and less predictable interconnect delay has shifted substantially
with each generation. Interconnect is inherently less predictable because there
are many different routing paths between a driver and a load with different
delays. The fastest choices are usually the scarcest and routing congestion
often leads to sub-optimal delays. Note that if the best routing resources weren’t
scarce, then FPGAs would be much more expensive and power hungry.
Another source of unpredictability is the embedded IP that has
been introduced in FPGAs over the past several generations. This IP includes
memories of varying sizes, DSP accelerators (configurable multiply accumulate
functions) and processors distributed in a non-uniform way across the FPGA die.
Suppose you increase the size of a memory (responding to a feature request from
marketing.) Now the synthesis tool changes what type of memory IP it maps to.
Unfortunately, the new memory IP is only available in a couple of special columns
on the FPGA die and the placement of your design is distorted from the original
placement, stretching critical wires to that column and back.
In many cases a fix to a timing problem involves an RTL change.
Usually, changes that improve timing increase area consumption. Consider a large
design with multiple modules. Module A has been placed next to modules B and
C. When we “fix” the RTL for module A to resolve a timing problem,
it expands into the area that modules B and C were using. This forces components
in B and C to move and stretch, often creating new critical paths. Note that
nothing about the logic in modules B or C has changed and a logic synthesis
flow would not change its estimate of interconnect delays in B or C just because
module A increased in size. This is because the coupling between RTL changes
and a new critical path in B or C is physical in nature.
So how do we win at the “Whack a Mole” game? Effects
that lead to unpredictability in design iterations are physical in nature, which
leads naturally to physical synthesis as a solution. When the RTL for module
A is changed and it enlarges, stretching the wires in modules B and C, the new
longer interconnect is correctly estimated and a combination of optimizations,
placement and local routing along the new critical paths automatically fix the
problem. Many of the moles are whacked for you and you never even see them try
to pop up.
Physical synthesis usually results in a higher frequency result,
but for many designers of complex chips the controlled, more predictable timing
convergence of physical synthesis is of even higher value.