Much of the focus in semis is on chip performance, and so for many outside the process it can be mystifying why sometimes a “better” chip loses out to a “weaker” chip. To name just one example, Intel still sells a lot of server CPUs despite their poor comparison with the latest AMD or Arm offerings. Much of this comes down to the structure of the data center market, and it is much more complicated than many would think. This is important for any company looking to tap into this market, whether with a CPU or the latest AI accelerator.
The first issue is that the market for data center silicon is highly concentrated among ten customers – the “Super 7” – Amazon, Google, Facebook, Microsoft, Baidu, Alibaba, and Tencent, to which we would add Oracle, JD.com and Apple. These companies consume well above 50% of the industry’s server-grade CPUs and over 70%-80% of other data center silicon segments. Beyond these customers, the shift of enterprise IT to the cloud leaves a highly fragmented assortment of smaller customers – financial firms, research labs, a few oil and gas companies, and some of the smaller Internet companies.
For large, established semis companies, this is almost insurmountable. These companies have to target the biggest customers, anything below the top ten is too small to move the needle. Many start-ups in the space are looking to start with the smaller customers, who can provide sufficient revenue to keep the lights on and the VCs interested, but eventually they will need to break into the big leagues.
Those big customers are fully aware of their market position. Moreover, they are writing big checks. So they make their suppliers run a gauntlet of qualification. This begins years before a chip is actually produced, as the chip designers seek input from their customers on chip specifications. How much and what type of memory will the customer use? How many I/O channels? etc. This is followed by models showing emulation of the chip design, typically running on FPGA boards. Once the design is finalized it is sent to the foundry for manufacture. Then the real work begins.
The hyperscalers have rigorous testing processes in place, complete with their own set of confusing acronyms. Typically, this involves a handful of chips to play around with in the lab. This is followed by a few dozen – enough to build a working server rack. All of this just proves the chip performs as promised at the design stage. The next step is to build a full-blown system – a few thousand chips. At this stage, the customers typically run their actual production software monitoring performance very closely. This step is particularly painful for the chip designers because they have no access to the customers’ software and so have no way to test out the performance ahead of time.
Around this time customers also build out sophisticated total cost of ownership (TCO) models. These look at the total performance of the system versus the cost of not only the chips but the other elements of their servers as well – memory, power consumption, cooling needs, and more. A difficult reality in this market is that while the main processor is the most important part of any server, it typically only comprises 20%-ish of the cost of that server. These models ultimately drive the customer’s purchase decisions.
While all this is going on, the chip company has to scramble. When the chip first comes back from the foundry, it may have bugs, and the manufacturing process needs to be tuned for better yield. So in the early days there are never enough chips to go around. Every customer wants to try them out forcing the chip designer to triage priorities and ration supply. When there are only a handful of customers this step carries considerable risk – no customer ever feels they have the supplier’s full support. Even as volumes increase, new problems arise. The customer does not want to buy chips, they want to buy complete systems. So the chip companies need to line up support from the ODM ecosystem. Those companies have to produce their own set of designs – for the board and the entire rack – and these need to be evaluated too. This is a big part of Intel’s staying power – every ODM is willing to do these designs for them as they likely do other (PC) business with Intel. Every one else has to struggle with smaller ODMs of the big ODM’s “B” Team of designers.
From first pen-to-paper to first sizable purchase order the whole process can take three to four years. Not as painful as automotive design cycles, but in many regards even more challenging.
In our newsletter this week, we linked to the news that Ampere is selling a Developer Kit version of its latest chips. While Ampere is still tiny relative to Intel, they have been doing this for long enough to have some experience navigating all the steps above. Those developer kits are a clever way to broaden their market. Ampere is small enough that smaller customers still matter to them. However, they are not yet big enough to provide full sales support to those customers. The developer kit broadens the top of their sales funnel by letting curious engineers participate in the first two steps of the evaluation process.
None of this is easy, and these complexities all rest on top of the challenge of actually designing a chip.