The hottest topic in semis nowadays is the way that big tech companies are building their own silicon. We wrote about some of the motivations behind this here and explored the economics of them here. The quick summary of all that is non-semis companies do not build chips to save money, they build them for strategic advantage (which may include saving money).
So it should come as no surprise that one of Google’s key strategic assets, YouTube quietly unveiled a new chip last month, but for some reason no one is really talking about it. Maybe this is the new normal and we are used to it, or maybe no one is talking about it because it scares the chip companies badly.
Google quietly unveiled it’s new Video Coding Unit (VCU) in a blog post from the YouTube Engineering team. The team also published an academic paper on this project, (a paper which is hard to find in Google search results). The blog piece is filled with some really interesting background on how YouTube runs, as well as the usual mind-blowing statistics – every minute people upload 500 hours of video content. The problem they were trying to solve is both fairly straightforward and staggeringly difficult. Users upload video of a certain quality, but then YouTube has to play that video back with a different quality depending on the viewer’s device – 4K for a wired TV, lower-res for a mobile device on a shaky network. The process of altering that quality level is called transcoding, and it is very compute intensive. Google had some some key trade-offs here. By using a new encoding scheme they could greatly reduce their bandwidth costs (which must be massive), but this came at cost of requiring a lot more compute. The VCU solved that problem.
A few things stood out for us about the VCU.
First, it is fast. They claim it delivers 20x-33x improvement in compute efficiency. By comparison, a new generation CPU can expect roughly double (2x) the speed of its prior generation, and that result is usually mostly driven by Moore’s Law improvement. When Google announced it’s machine learning chip, the TPU, six years ago, they talked about 30x-80x better performance, and that allowed them to halve the number of data centers they would need to build. So the VCU is not quite as powerful (the math it is doing is a lot more complex), but would clearly still have a big impact on YouTube’s data center needs.
Secondly, and more importantly, the team that built the chip was largely comprised of software engineers, and not that many of them. This demonstrates how ‘easy’ it has become to design a chip. The academic paper has 52 co-authors, and we are guessing that comprises the whole team. The article makes it sounds as if these are the same people who are building YouTube’s software platform. So the team has an in-built advantage against any merchant chip company who wants to compete in this space. The big processor vendors (e.g. Nvidia, AMD, Intel, Qualcomm) usually have several hundred designers working on their core products, and then they have at least as many building software for those chips. Google’s approach is much more efficient, the benefit of vertical integration.
Admittedly, it takes many more people to bring a chip to production. Google probably worked with one of the big chip companies’ “ASIC” team to do the back-end work. (Our guess is Broadcom). But there is a more disturbing possibility, does Google have a chip operations team? They would only do this if they have several chips in production. If true, that should really worry the big chip companies. It is not clear who did the ASIC work for Google, the article is unclear and the job listing on Google are ambiguous, but it is definitely a possibility.
There is a lot for the chip industry to take in here. We mentioned the design advantage and efficiency Google enjoyed. But there is also the fact that Google essentially just invented a new category of chip, again. As far as we know, no one is really building anything directly comparable to a VCU. So Google, in need of a solution to their specific problem, went out and built it themselves. And we are not sure any chip company will go out and build a competitive product. The market for this chip has a fairly small customer pool. There is only Google maybe Facebook, and a couple companies in China. By the same token, this is just one more insurmountable barrier to entry for YouTube’s market. No one will be able to compete with YouTube unless they can build a chip too.
Certainly seems like they have and are adding to or are building an ASIC team in-house at Google…hiring in Sunnyvale and Israel. There are ASIC verification engineers in India already.
You have great insights on industry. Keep up the good work! I am a fellow former banker and equity research analyst turned corporate strategist and now CFO/advisor to early-stage companies. If you are ever looking for help researching companies, markets, etc. please let me know – I love to get into the weeds on things!
Cheers, Byron Raco
On Tue, May 11, 2021 at 8:25 AM Digits to Dollars wrote:
> D/D Advisors posted: ” The hottest topic in semis nowadays is the way that > big tech companies are building their own silicon. We wrote about some of > the motivations behind this here and explored the economics of them here. > The quick summary of all that is non-semis companies ” >
Did you see page 9 of the Google paper showing performance and then performance/ TCO? They compared a system using 10 PCIe cards with 2 VCU chips each, so 20 VCUs, vs. a dual socket Skylake server. Performance was 20x better, but performance/ TCO was only 7x better for the VCU system. The die size of these VCU chips must be enormous given how big the heat sinks are. Is that all that impressive? Of course if you use 20 accelerator ASICs you should get better performance than Skylake. How does this look vs. AMD Milan?
I think the more important comparison is with vP9 their new encoding scheme. I think that was actually the whole point of the VCU. YouTube wants to move to vP9 because it must cut their bandwidth costs immensely. So there the TCO is 33x better than Skylake. Best guess that is still 15x-20x better than Milan.
Ok I see what you mean. The ASIC must be optimized for vP9 since it actually has slightly better performance on that than on H.264 while Skylake alone of course has 1/6 the performance. But still – 100x better performance with 20x more silicon – is that even all that impressive? The picture of the PCIe card they show has a whole lot of chips on it. I wonder how much this is driven by Google’s reluctance to move to copackaged optics in the future even as they confront the challenges of 800G and 1.6T pluggable transceivers. As you say, bandwidth is a concern for them.
I think (and I really have to re-read the paper a few more times, so this is a guess) I think that when they say 33x better TCO (and in the interview blog he says 33x effective performance) they are factoring in all the hardware (and cooling and electricity) costs. I’ve built TCO models for cloudscalers and they factor in all those things. So that 33x is 33x all-in, which is definitely worth it.