Some times big performance breakthroughs require taking a step back and reexamining a problem. We’ve done just that for SBOCV table generation and are really pleased with the outcome. (Stage based on chip variation is also sometimes called AOCV.) We can now generate tables for all the combinatorial cells in TSMC 40LP library in just under 4 hours! That’s 100 times faster than our June release.
We’ve been working closely with TSMC for a while now to provide a very fast solution for SBOCV table generation. Earlier this year, TSMC told us it took them 3 weeks to create tables for 30 simple cells using a commercial version of SPICE. That wasn’t 1 copy of SPICE – it was 120. The looming problem is that some of the more complex cells in the TSMC libraries have as many transistors as the 30 simple cells combined. As the complexity and transistor count of cells increases so does the simulation time.
Clearly, TSMC and our other customers want to generate tables for more than a small subset of the cells in their libraries. When compared to SPICE, Amber Path FX is blazingly fast. However, our own testing and feedback from TSMC was that finishing a whole library with Amber Path FX was still taking too long.
So how did we make table generation 100x faster than our June release? We’ll we broke the problem in to two different questions. How can we make our fast transistor simulation even faster? And how can we take advantage of the commodity hardware we have at our disposal?
The answer to the first question was mainly solved with good old fashioned software engineering – profiling, testing, optimization, and plenty of trial and error. Throw in some heuristics for good measure and we were able to reduce the simulation time for complex cells by an order of magnitude. That’s a tremendous improvement but still not enough to get the kind of turnaround times we were looking for. The benefit of this work is that we now have a great set of data to continue to improve performance in the coming months.
The answer to the 2nd question seems obvious now but it took a little while for us to realize how well it would turn out. Amber Path FX was already multi-threaded. We have a bunch of 4 and 8 CPU machines in our server farm and a few 16 CPU machines. The software scales nicely on a single machine. But what would happen if we could get more than 1 machine working on the problem at a time? It turns out that making the software multi-threaded to start with had laid a good foundation for distributed computing as well.
We turned the distributed version of Amber Path FX SBOCV table generator loose over the weekend using 15 of our 8 CPU machines (120 total CPUS). The results: total run time of 3 hours, 50 mins, and 7 seconds for 852 cells.
We think that Amber Path FX is going to be a great answer to the “Where doe we get AOCV tables?” question. If you are seriously looking at adding SBOCV/AOCV to your static timing flow we’d love to share our results with you.

Trackbacks/Pingbacks
[...] Advanced On Chip Variation Guess you know the Amber Path FX software, CLK Design Automation and its blog? Reply With Quote View [...]