Article 06
We built GPU DRC because CPU checks were too slow.
We hit this during the HC-1 ASIC route. The normal CPU-side processing around physical verification was taking too long for the way we needed to work. When a check takes multiple hours, every routing change gets heavier. You start batching judgment, waiting on tools, and losing the thread of the layout problem you were actually trying to solve.
So we wrote a CUDA DRC path to pair with KLayout. On the captured HC-1 ASIC replay, GDS export took 38.360s, the RTX 4090 GPU DRC/report stage took 8.960s, and the end-to-end replay finished in 47.320s.
Once a check comes back in under a minute, it changes how you route. You can try the fix, inspect the result, and keep moving while the design is still in your head.
The replay
The replay used real ASIC layout output from the HC-1 route. The path exported the GDS into a packed polygon-edge dump, checked M1-M9 width and spacing rules, and wrote a KLayout report database for normal marker-browser inspection.
The captured run configured 18 checks across layers 19/0, 20/0, 30/0, 40/0, 50/0, 60/0, 70/0, 80/0, and 90/0, and reported zero violations. We have released the tool under Apache 2.0: Acculux GPU DRC.
Why we cared enough to build it
HC-1 is a system-level effort. It spans photonic layout, digital control, interface mapping, noise modeling, recovery, and end-to-end validation. Slow verification at any one layer drags the whole program. This GPU DRC work removed one of those delays from the ASIC side and made the metal-check loop feel like part of routing again.
This tooling work rarely makes the headline, but it matters. When the check is fast enough to sit inside the routing loop, the whole design process gets sharper.
What comes next
The same idea may be worth applying to other parts of the open-source physical-design stack. OpenROAD has places where acceleration could matter, especially when routing and repair loops start to dominate iteration time. The first public package is the KLayout companion GPU DRC path.
The immediate result is already clear for HC-1: a multi-hour CPU bottleneck became a seconds-scale GPU replay on real ASIC layout data.