Last week we dug into the economics around Internet companies building their own chips. We highlighted the strategic and (lesser) financial advantages to this process, but we cautioned that it carries some serious risks. In this post, we want to dig into what those downsides could look like.
When people look at this process, there is a tendency to focus on the upfront costs. You have this nice Internet business selling ads or enterprise software or everything, and someone in the operations department has the idea to build a chip. The first question is going to be why? But we can assume the operator has this answer well rehearsed. So the next question is going to be – How much will it cost? Hundreds of millions of dollars upfront seems daunting, but that is not the real problem, especially when you have hundreds of billions of dollars in the bank earning negative interest.
No, the real problem is what happens if something breaks. Recall from our earlier posts that the real benefit of rolling your own chip is that it only makes sense when the chip in question conveys some meaningful strategic value. This means the chip will be deployed in a critical area. Suddenly your whole strategy for the whole company is off track, and maybe in a catastrophic way.
Let’s look at Google’s recent VCU chip. Someone at Google convinced management to invest in building a chip. This chip was likely driven by YouTube’s desire to upgrade to a more bandwidth-efficient encoding scheme for videos. The upgrade is likely going to save billions in network costs. But if the VCU gets delayed, then the whole upgrade process is delayed until the chip is fixed. And since the team has already allocated hundreds of millions of dollars on this project, they likely deferred investments in the legacy system. Why spend capex dollars for equipment that will be obsoleted by your own chip? So not only is the upgrade off track, but so is the transition to the new encoding scheme, which hundreds of people have been planning around for a year.
The best worst hypothetical is the case of Apple. Apple just built a CPU. These are huge engineering projects, probably thousands of people involved. And it took years to design. At some point in that process, someone had to say “We will have this chip ready on this specific date, for this specific product on our roadmap.” Since CPUs are the heart of the computer, all kinds of other things had to change. The hardware designers had to factor in this new chip, which is not a trivial exercise. More critically, the software team had to make sure their output would be ready. Again, this involves thousands of people. Instead of designing a new feature, or fixing some bugs, all these people had to get the software ready for that new chip. It is probably not a coincidence that the Apple’s software quality took a big hit in reliability during the second half of the 2010’s, just when this prep work for the new silicon was likely at its most intense.
So imagine late in the process they found a bug that required a rework of the chip. Suddenly the launch date has to be pushed out, and then the company has a giant hole in its product line.
The threat is even bigger for the iPhone. The timing for the release of Macs seems a lot more flexible, but everyone expects a new iPhone every year. A six month delay due to chip rework would be a major disaster for Apple.
Undertaking this kind of endeavor is not for the weak-willed. And chip designers do a lot of work to mitigate all these risks. We touched previously on the subject of chip design software tools. A rough estimate holds that over half of this spend goes to simulation and verification to make sure the chip will do what it was designed to do. Companies could also design back-up options – designing an iPhone with merchant silicon for instance – but this is hugely expensive and usually just ends up draining resources that would be better spent on the base plan.
And here is the biggest problem – bugs will happen. Not necessarily in a catastrophic way, but there are going to be errors in chips. The whole project is too complicated with too many moving parts for nothing to go wrong. Also, recall that Apple is building SoCs, chips that perform multiple functions – graphics, raw compute, image processing, neural networks, etc. – and that they are pushing the envelope on all of those fronts. There will be bugs somewhere.
Apple and all the others are also riding Moore’s law really hard, constantly pushing the envelope of process nodes. There is a class of bugs that only appear in the manufacturing process, which means they can only be discovered way too close to product launch. Most of these can be fixed quickly, but ask any chip designer how well they sleep in the nights right before they get first silicon back. When you are designing products that someone else is going to manufacture, using processes that trip against quantum physics effects, no one is cool enough to not be a little worried that something will go wrong.
Photo by Sarah Kilian on Unsplash