Who should roll their own chip?

Ben Thompson had a good post this week looking at a report that Facebook Meta had abandoned the chip they had been designing internally to power the next version of their smart sunglasses for an off the shelf part from Qualcomm. As usual, Ben has a really solid analysis of why this chip did not fit with Meta’s AR strategy, and how they do not seem to know what that strategy is. Reading his piece we found ourselves thinking about who should design their own chip and who should not – a topic on which we have been doing considerable work recently.

We examined this subject in depth a few months back when we did some back-of-the-envelope math around Apple’s M1 CPU. Our conclusion there was that building a chip internally was at best a break even proposition in terms of replacing a merchant silicon solution. Instead, companies need to build chips that convey some strategic advantage. For Apple, this advantage comes in the form of a tight binding of the processor to the device’s operating system (OS) which leads to noticeably better performance in their phones and laptops. For Google, its TPU and VCU likely yielded billions of dollars of capex savings.

Seen in this light, Meta building an AR chip makes little sense. As Thompson points out their metaverse strategy is highly confused, and building a chip was not going to solve any particular problem.

That being said, we remain highly curious as to why Facebook has not built a chip for their own data centers. They are rumored to be building their own AI accelerator, although this product seems to be taking an inordinately long time to reach the market, perhaps due to abandoned attempts with past partners. We have to imagine that Facebook could use as many such chips as Google does. Facebook does a huge amount of image recognition, a task well suited for AI chips. It is possible that Facebook’s software stack is not as homogeneous as Google’s. At its core, the vast majority of Google’s compute is used to execute a single algorithm – search (albeit an incredibly complicated algorithm). By contrast, Facebook has to perform multiple other tasks around connecting everyone with their network and ranking news items for engagement. It is possible that this diversity of task alters the math around designing an AI chip internally because they would need multiple chips and thus any advantage is muted by much higher expense. It is also possible that Meta does not have a chip design team capable of building such a chip, or more likely its best chip designers were instead told to prioritize the ill-fated AR chip. Finally, it is also possible that Meta’s propensity for building data centers with the cheapest parts, premised on higher failure rates, leads them to believe that they already have the cheapest processors. If true, this would mean they are performing the wrong calculation, looking at chip price, rather than strategic benefit.

To be fair, Meta is not alone with this problem. As much as it seems that everyone is building their own chips, the truth is for most companies an internal solution is not the right path. Designing chips is expensive and risky. Many companies may find that their compute needs are too diverse, with no single chip capable of meriting the effort required to build internally.

As a result, chip design is coming out of companies that have the ability to differentiate with these chips, or defending from competitors who are doing so. Thus the mobile phone makers and some of the big Internet companies. This makes sense for other companies making hardware – Cisco being the best example of this, they are the grandfather of these chips, having rolled their own for decades. But the other big category is clearly going to be the auto makers. As we have chronicled a bit recently, this is the next major battlefield for chip design. Some will argue that auto company that designs its own chip will be the one that dominates the future. We are not convinced of this, far more important will be the software that controls autonomous vehicles. Whoever does build that software will likely want to design their own chips, and that combination may make them unstoppable. But this analysis contains a mountain of “if”s, with much of the auto software stack still filed under To Be Determined.

The final segment to consider is what the large public cloud service providers are going to do. They consume a massive amount of semis, but are taking a variety of approaches to designing their own. We will return to this topic soon.

3 responses to “Who should roll their own chip?

  1. Meta’s social nature makes their stuff highly unpredictable. They recently just killed IGTV and merged IG Video with Reels. Because they need to move fast and break things, coding stuff into the silicon makes little sense to make to me.

    A Google engineer in a lecture recently told that eventually, all Google’s Team will design their own silicon. That makes sense: YouTube CODECs are fairly stable, so they can create ASICs for that. Of course, search is basically the same product since Larry and Sergey created it 2 decades ago, so there’s also lack of need for flexibility.

    Meta’s core computing function, picking stuff to show you with the algorithm, will change all the time. Core functions for Google Search, Images, Maps and YouTube can move at a slower pace.

    There’s also the fact that Meta is starting to work with Amazon to pursue a hybrid cloud strategy. If Meta were to buy cloud services from itself (like Tencent does!), if would be a sizable cloud services company.

    Their AR strategy is fucking weird, but would benefit them if the core inputs for AR were monopolized by them. But I think that playing knife fight with Qualcomm is way harder than creating an ASIC tailor-made for YouTube.

    • You mentioned “A Google engineer in a lecture recently told that eventually, all Google’s Team will design their own silicon”. I 100% agree with this. Are there any links to this lecture?

      Facebook moving to AWS would be a huge shift in the industry.

      As for the difference between Google search and Facebook’s stack – I’m not sure I agree. Google is perpetually tweaking its algorithm, so I have to think that will affect silicon performance. Facebook has a much different task, but ultimately I think it is common enough across the core platform that it could benefit from internal silicon. They’re always going to need to construct their users’ social and interest graph. This is a finite concept. True, they will constantly tweak the querry parameters, but that is analogous to Google tweaking their search algorithm. I get that it is computationally very different, and harder, but there is a high enough degree of commonality at the bottom of it to think they could benefit from an ASIC.

      • “However, if you look at Google’s products, where our demand for compute power continues to grow substantially, frequently at exponential type numbers, and this used to be a free ride with Moore’s law. Giving us increasing compute power to keep up with this increasing demand for computation. But that’s kind of come to an end now which isn’t great, and so we have a lot of projects at Google that are trying to solve this and I’m working on one of them. I don’t work on the most successful project we’ve had here, which is the TPU. This is a ML accelerator that has drastically increased our ability to do ML compute, and this is kind of an interesting thing in its showing that hardware, that is the main specific, can potentially keep up with this growing demand for compute. The problem is it’s taking a lot of effort to create these hardware accelerators and while some groups at Google are big enough, like the ML people to have dedicated teams working on dedicated hardware like the TPU um, we are looking at the problem that every team at Google is eventually probably going to have to look at hardware accelerating their workloads, especially if their demand continues to rise.” – Tim Ansell, Google

Leave a Reply to Matheus PopstCancel reply