Amid a flurry of announcements last week Amazon unveiled its new chip for machine learning (ML) training (with a write-up from Techcrunch here). We mention this in case it was not already clear that Amazon is a chip powerhouse.
We often conflate ML with AI, and for our purposes here we will use them somewhat interchangeably. (As the joke goes, statistics are written in R, ML is written in Python, and AI is written in PowerPoint). ML is an important new software tool that is very powerful for some very useful applications such as image recognition and language processing. Some people call it “teaching” computers to “make decisions”, but is more akin to very advanced pattern recognition. We are not trying to diminish the significance of ML/AI, but we have to walk a fine line between the marketing and the reality.
A key aspect of ML systems is that they first need to be “trained”. Give a computer a set of data and then run some very powerful software analyzing that data and it will return a model which can then be used to infer results from new data. For example, feed the ML system a database of photos of human faces or X-rays and it will then be able to detect faces in other photos or find anomalies in other X-rays. The key point is that there are two steps “training” and “inference”. Training systems need a lot of resources because they are chewing through a lot of data. By contrast, inference systems need to be fast as they use the model to detect targets, usually in real time. And since we are talking about running ML on semiconductors, there are two classes of ML accelerators (or ASICs, special purpose ML chips) – training and inference chips.
Turning back to Amazon. Their new Trainium chip, as the name implies, is built for Training models. They launched their first Inference chip two years ago, handily named Inferentia, and we believe they are now on their second or possibly third generation of Inferentia.
In their announcement, Amazon points out that Trainium is 45% faster than running on the fastest GPU instance AWS offers. This is, to put a technical term on it, a lot better. When Google kicked off the ML accelerator business with its TPUs almost ten years ago, they said that their chips would halve the number of data centers they would need to build. So big numbers, and we are now several generations of chips beyond that.
The industry has used GPUs for a decade+ to perform ML math, but purpose-built chips will always outperform general purpose chips. GPUs have to do a lot of other things and pairing a chip down to its essence as AI accelerators do yields big performance gains. There are still some important reasons to use GPUs (read software), but increasingly the ML workloads are moving to purpose-built silicon.
All of this has big implications for the broader chip industry. As we have noted a few times in the past, the big tech companies now have strong strategic reasons for building their own chips enabled by shifting industry economics. As much as the press focusses on Apple and it’s M1, Amazon’s silicon path is arguably just as important for the industry.
Note that when Amazon announced Trainium, they were actually announcing a slew of new ML instances coming to AWS. Included among those were instances using Intel’s Habana chips. Intel acquired Habana late last year for close to $2 billion. We would guess that Intel acquired Habana once that start-up had won its Amazon business. So imagine you are Intel. You paid $2 billion for a design win at the biggest customer for your most profitable business unit, and when that customer announces your win, they also launch their own competitive product. And note the performance, Amazon says Habana is 40% better than GPUs, while Trainium is 45% better.
And this is not a problem only for giant Intel. It is also a problem for the 100 or so ML accelerator starts-up out there. The chip industry in general is heavily constrained by the fact that in many categories they have to endure intense customer concentration. Something like the top eight cloud-scale data center owners consume 50%-60% of output for many key categories. For a start-up to go against that requires significant performance advantages. Those solutions exist, but industry economics make it understandable that venture investors are wary of investing in chips and why China seems to be able to leap frog US competitors in this one area of semis.
Pingback: Heterogeneous Compute | Digits to Dollars·