We have been writing a lot lately about the comparative performance of Arm CPUs and AI accelerators in the market. And as much as we are believers in their prospects it is not always easy to discern exactly how good their reputed performance actually is. In our newsletter (which you can subscribe to on the top right of this site) we regularly link to reports of performance of various solutions in the market. And while we occasionally do see a software company report on the actual performance they are seeing in production, most of the reports we see only look at benchmarks.
There are dozens of these depending on the chip in question. Some are put out by industry organizations, other by academics, and some by reputable analysts. But they all share a common problem – they do not necessarily reflect real world results. Every company has its own particular software workloads, and these all behave slightly differently on different semis. The benchmarks can be a decent proxy, but they will always be just that, an altered reflection of reality.
Nowhere is this more true than in the market for data center semis. Here, there are roughly a dozen customers who consume the large majority of industry output. These companies buy in bulk – sometimes millions of chips a year. They make their chip selections only after very rigorous evaluation. Small changes in chip performance can have outsize impact on system performance. Billions of dollars are at stake. And these companies do not use public benchmarks when making decisions.
Instead, they take (sometimes buy) samples of the chips in question, run their own software on those semis, and carefully measure the output. They have to do this to make sure they are buying the optimal solution. The catch is that they almost never actually let their vendors see the software. In part, this is because this software is the very definition of competitive differentiation, and leaks can have a big impact. But they also do not want the vendors to game the tests. We will not name any names, but many very large chip companies are known to add special ‘features’ to their chips when running the benchmarks. This is not quite the same as Volkswagen tuning its engines to behave differently for environmental tests, but it is not entirely dissimilar.
Ultimately the publicly available benchmarks are interesting but only in the sense that they make good marketing copy.