Amazon enters the ARM Server fray

Last week Amazon announced that they have built their own server-grade CPU. We have not touched on this product here for some very specific reasons, but this is big news, and it ties in neatly to our Semis report from last month , so we felt we had to broach the subject.

Server CPUs are a roughly $20 billion market. Readers may be familiar with hot products like GPUs and special machine learning chips for the data center. Those are great, but they are still a niche market when compared to CPUs. What makes server CPUs especially interesting is the fact that Intel has enjoyed a near-monopoly on them for close to a decade. Intel makes about 30% of its revenue and almost 50% of its profits from these. As we touched on in our note last month, customers are increasingly uncomfortable with Intel. In many ways, Intel has been behaving like a monopolist. Its last few generations of products have been viewed with disappointment, offering few performance boosts. At the same time, the company has been steadily raising prices. Amazon now seems to have taken the logical next step and built its own product to answer this need.

Please keep in mind that Amazon, through its AWS cloud service provider subsidiary, is probably the largest customer for Intel’s server line. Our best guess is that Amazon probably  consumes ~20% of Intel’s Xeon chips.

So at first blush, this is bad news for Intel. We would argue that the bad news has barely begun. Intel is not going to spiral to zero, but they are up against some big problems .

First, we have to take a look at what Amazon has announced – which turns out to not be entirely clear. Their announcement was centered on their service offering – various instances of their new Gravitron processor. We are not experts in AWS pricing, but this looks to be priced roughly at par with other AWS compute instances. They have disclosed very little about the chip itself.  One article in the Register got a hold of a bit of detail, but nowhere near enough to perform a real comparison. Reading the tea leaves of all this, we believe that Amazon’s Gravitron is very  much a trial product. This is their first generation of silicon (as far as we know) and performance looks pretty modest. The Register compares it with Qualcomm’s Snapdragon 835, which is a great product for phones, but is far from the ‘heavy iron’ demanded of an always-on server CPU.

Crucially, the Gravitron has very modest software support. Amazon points out that it runs three versions of Linux (Ubuntu, Red Hat and Amazon’s own flavor). We would classify this as barely table stakes. Their lead designer, in a separate blog post, says the Gravitron targets workloads such as “web servers, caching fleets, and development workloads”. Allow us to translate. Web serving and caching are highly repetitive functions of relatively low compute intensity. And for ‘development workloads’ read testing out things that you would really want to run with a more powerful CPU when in production, facing actual customers.

Gravitron is built on an ARM architecture, which is of course what makes it so interesting, it is not running on Intel’s x86.  Greatly oversimplifying, ARM makes use of smaller processing cores. These take up less real estate on a chip (and recall that semis are an exercise in applied geometry). This means they can fit more cores on a single chip which makes the chip cheaper and lends itself to highly repetitive tasks (like web serving and cache fleets). On the other hand, smaller cores tend to be less powerful, tipping the balance back to x86 for production workloads.

Our interpretation of Amazon’s announcement is that they have a new chip out, but that it is not ready to take over the world.

It turns out the big challenge for building a server CPU is the software. Intel has 30 years of ‘helping’ people port their software to x86 and optimizing performance on those chips. The porting piece is now fairly simple, basically re-compiling. However, the optimizations take a lot more work and require software creators to dig deep into their code. Amazon appears to have a product that has reached that break point between compiling-is-easy and optimization-is-hard. Amazon essentially admits this in their announcement:

If your application is written in a scripting language, odds are that you can simply move it over to an A1 instance and run it as-is. If your application compiles down to native code, you will need to rebuild it on an A1 instance.

The implication being that if you have written code that is heavily optimized for x86, which is a lot of what matters in this market, you will have to ‘rebuild’ it, and that is no small task. We would actually question the “move it over” part for scripting languages. From what we know, even scripting languages have some drawbacks running on ARM because the scripting interpreter itself needs to be optimized for Gravitron, and very few companies have done this. 

Stepping back from the weeds, our conclusion is that Amazon has merely fired its first shot in this fight. Intel has deep pockets and healthy cash flows from its legacy businesses. They are not going to lose 20% of their server revenue soon. Nonetheless, they now face a storm: problems scaling beyond 14nm; new competition from AMD (NOTE: We own a small position in AMD); a CEO search; and deep questions about their future. Add to this that now one of their largest customers is competing with them. 

Turning back to Amazon. Our chief question is just how serious are they about this? From their limited blog posts, it is clear that their are some passionate advocates of ARM servers within the AWS organization, but the company has to weigh the costs and benefits. The costs are significant. Our best guess is that they have a few hundred people internally working on silicon. It looks like the kernel of the Gravitron team came from the Anapurna acquisition three years ago, but they would have to have added significantly to that small team. We have no doubt that Amazon can improve on Gravitron, but the hard part is going to be building up software support. Amazon already has a massive software team that could probably handle a lot of the porting and optimization work, but that will come at the expense of their regular job of building for and supporting AWS customers. Amazon already supports all the software needed for a robust server ecosystem, the  question is will the organization tolerate this added role. Set against this, the benefits are not entirely clear. Yes, someday ARM-based server CPUs will be able to perform many tasks more efficiently than x86 CPUs. The electricity cost alone probably more than covers the costs of the chip design team. Beyond that however, no one really knows what the demand from customers for ARM servers will be. If Intel stays stalled at 14nm for 3 or 4 more years (which could happen), then the move to ARM servers will become unstoppable. But what is that hedge worth to a company of Amazon’s size?

This really comes down to the mindframe of AWS management. Are they so convinced of the benefits of this that they are willing to keep all the other business units in line? Are they willing to devote 1,000 engineers (or their equivalents) that it will likely require to port over every application? At this point, we have no way of knowing. From what we know of the server market, we think they should take this risk. It will allow them to (eventually) break free from the Intel monopoly (with some help from AMD too). Judging from the way Amazon overall views its suppliers, this is likely not the last we hear from them about Gravitron and its successors.


One response to “Amazon enters the ARM Server fray

  1. Pingback: Intel Follow-Up – Data Centers | DIGITS to DOLLARS·

Leave a Reply