Making all the chips

Last week we took a skeptical eye to the ambitions of Internet companies rolling their own chips. Our core thesis holds that for companies to take this strategy, they need to build chips that convey some form of strategic advantage. Building a chip simply to save money on those chips is usually not worth the effort. In this light, Apple building their own N and M Series processors matters because it differentiates their phones, tablets and computers from those of their competitors. Similarly, Google building its TPU and VCU saves that company billions of dollars a year in reduced capex and opex. By contrast, Facebook building an AR chip made less sense, owing in part to the fact that the company’s AR strategy is not fully fleshed out.

So what does this say about the biggest enterprise consumers of electronic systems the public cloud service providers – Amazon’s AWS, Microsoft’s Azure and Google’s GCP?

It turns out that the answer to this is not entirely clear. Note the above success stories, those are all companies that control the entire software stack that runs on top of their homegrown chips. For the cloud providers, this is not the case. Each of them has to support the software stack of all their customers – the enterprises, start-ups and governments that run in those clouds. No one can build a chip that supports every form of software out there.

In these situations there are really only three ways in which homegrown chips might work for cloud service providers. The first is for special purpose software loads – and here we are really talking about AI accelerators. This is still an emerging field, with no dominant software stack. Everybody wants to run more “AI”, so having a chip that makes that more efficient might make sense. Unfortunately, even here every AI software stack is a little different. Small changes to the specific neural net algorithm implemented can have an outsized impact on chip performance, enough to negate any advantage of purpose-built silicon. Now someone may try to build a general purpose AI chip, but those already exist, we call them GPUs, and Nvidia is doing a pretty good job making them.

The second category is somewhat derivative of the first, general purpose chips – CPUs. Here the goal is to build a system with a lower total cost of ownership than off-the-shelf, merchant solutions. The best example of this is AWS’s Graviton Arm-based server CPU. Arm CPUs are almost as good as x86 CPUs in terms of raw performance (we know this is highly debatable, and it is a topic we will return to soon), but in terms of cost, Arm chips often come out a bit ahead because they generally use less power for a given increment of compute. AWS may just see enough advantage in building their own CPU (and their AI cousins Tranium and Inferentia). Given AWS’s scale, it is possible that they are able to save a significant amount of money by going down this path, but it may not make sense for anyone else.

Finally, the cloud providers may want to run whatever software they themselves control on top of their own silicon. AWS obviously supports Amazon’s ecommerce website. In addition, the overhead of managing AWS itself is a significant software workload. No one knows the exact amount, but we have heard estimates that 10%-20% of AWS workloads go to managing AWS. In both of these cases, Amazon controls all the software and so they might see real performance gains by moving to homegrown silicon. We suspect that AWS runs a lot of its own workloads on Graviton already. Getting all of Amazon’s and AWS’s software ported over will take time, so it may be a few years before we can really gauge Graviton’s progress, but the fact that AWS is sticking with it for another generation gives us some confidence that Amazon is in fact moving down this path.

But what about GCP and Azure? Here it is harder to say. Both have the same problem of having to support the highly heterogenous software stacks of their customers. And so it is not too surprising that neither has really launched their own silicon for their public cloud businesses. That being said, their cases are a bit different.

GCP is the smallest of the three, and seems the least certain about its strategic direction. So they may be in no hurry to roll out their own CPUs. On the other hand, Google has built immense internal chip design capabilities here, which somewhere down the road may make it easy enough for someone at GCP to join in the fun.

By contrast, Microsoft would seem to make the perfect candidate for building their own CPU. Azure’s selling proposition seems to be “Enterprises run a lot of Microsoft on-premise, let’s make it easy for them to run in Azure too.” (Note to Microsoft: we are available for marketing consultations.) It would seem that they have a big leg up when it comes to squeezing out strategic advantage in the cloud by tying their own software to their own silicon. The problem seems to be cultural. Microsoft has spent 40 years optimizing its software to run on x86. Look at the decade that it took for them to port Windows to run on Arm, which took immense pressure from Qualcomm to make real. There are persistent rumors that Microsoft is working on a PC CPU to help its Surface line compete with Apple’s M1, but this is a drop in the bucket of total PC volume. On the data center side, Azure seems to have little interest in building their own chips. There are rumors that they are working on something, but those rumors have been circulating for far longer than it would take to design a chip (or two, or three). Microsoft could very well be working on something very quietly, the surprise is just that it has taken them this long.

Putting this all together, there is not a clear, compelling case for the cloud service providers to build their own chip. The needs of their customers do not point in this direction, at least not yet. We think all of them will eventually go further down the path than they have already, but it may still take some time.

Post-script: We probably should include Alibaba’s Aliyun on this list. The same logic holds for them as for the other three, and we have already chronicled their attempt to build a CPU.

Leave a Reply