10 Comments
May 23·edited May 23Liked by Daud's Scout

Thanks for the interesting take when most seem to think that ARM is overvalued.

I'm curious to see what you and others think about why someone didn't buy Nuvia for their actual ARM server CPU design and instead Qualcomm snagged them who had infamously scrapped their server CPU efforts a while ago.

Why would any of the hyperscalers not want a world class design team actually designing an ARM server CPU? I know a server CPU is a lot more than just the main CPU core, but all of them have internal efforts anyway. Maybe not Amazon since they bought Annapurna a while ago, but it could have been a good fit for the others.

Were they afraid of Apple and the Apple lawsuit? They just want to focus on power efficient ARM CPUs and expect AMD to be good enough at the high end? They expected ARMs own designs to catch up? Only the legal team at Qualcomm had the chops to stand up to Apple?

Expand full comment
author

I assume by Apple lawsuit you mean the "Gerard Williams III poaching" suit. That was a frivolous lawsuit. Obvious it was going to fail. Look at the behavior of various Qualcomm physical design engineers migrating to ARM (LTD) is a highly correlated manner. QCOM could have filled a similar lawsuit against them but did not bother because these things are impossible to prove unless someone does something very stupid in writing.

With legal situation out of the way let me re-frame your question into two parts:

1. Why did nobody else (Google, Microsoft, ...) buy Nuvia for the server CPU design?

We have two sides to the server CPU space, high-performance (HPC) and cloud-optimized. Ampere Computing, Google/Marvell Axion, Microsoft Cobalt, and Amazon Gravition are all in the cloud-optimized category. The Nuvia server design was for HPC.

2. "They just want to focus on power efficient ARM CPUs and expect AMD to be good enough at the high end?"

Yes I agree with your statement but want to re-frame it slightly. AMD Epyc CPU roadmap is very very strong. Clear leader in HPC workloads. 3DVCache massive advantage. Chiplet-based scaling and incredible yield. The rumored monolithic Nuvia server CPU had no chance against the great stuff AMD is putting out on HPC CPU.

Expand full comment

Right, I did mean the Apple employee poaching lawsuit. I was aware of it being frivolous, especially so in California, but I guess what I was trying to say was that the nuisance value can be quite significant (for the filer). My spouse was on the receiving end of a nuisance lawsuit and it's hard to explain the stress it can cause depending on your personality. Not to mention the time and effort on a busy person's time.

I guess it would have taken some time to rework the Nuvia cores in an HPC/chiplet approach, it's a pity no one attempted it.

As an aside, what do you think is the reason behind why Apple/Nuvia were able to eke out so much performance? Just plain old hard engineering to make much wider cores? Tighter integration with TSMC process improvements due to the huge volume/scale advantage? Plain old caring more and spending more money to get it done? Or is there some secret sauce that Intel/AMD could imitate?

Expand full comment
author

Sorry to hear about what happened to your spouse.

I feel like the problem was not the Nuvia cores but the SoC. HPC (chiplet or monolithic) needs great I/O and NoC. Market probably moved faster than what Nuvia expected with Orion (server product). ARM lawsuit, which was brewing in the background based on now public filings, likely slowed things down a lot.

CPU microarchitecture design is very difficult. My view is that before Nuvia's existence, there were three companies (no particular order) who had good CPU uarch teams, Apple, Intel, and AMD. When Nuvia was created and bought by QCOM a 4th viable player entered. Lot's of dead CPU teams out there. Samsung fired the Mongoose team. QCOM reorged Kryo into Falkor and then fired them. Cavium/Marvell CPU group disintegrated into nothing.

The historical patterns are distorted because ARM (ISA) is widely licensed while x86 is an Intel+AMD duopoly with legally enforced moat. Both Intel and AMD have excellent uarch teams and ARM ALA licenses. They could make a great custom ARM (ISA) CPU core but have chosen not to for self-preservation reasons. I have a tinfoil hat theory that AMD has an in-progress project based on leaked Microsoft slides but we shall see.

Expand full comment

Thanks very interesting. Though the calculation of royalty/Grace CPU seems high. Royalty/core is 50c-$1 in infrastructure segment and ARM said Cobalt 100 gets to the top end of $1 because it's a subsystem with 2x royalty so Nvidia will be towards the bottom end. Suggests possibly $37 of royalty content/Grace CPU (74 cores x $0.5 per core). Makes material upside near-term difficult even with a lot of units. And if you think that kind of upside is possible you should probably stick to Nvidia because that's way ahead of consensus! Any pushback?

Expand full comment
author

Cobalt is Cortex N while Grace is Cortex V (bigger core) so that helps close the royalty gap. Also, subsystem aka CSS aka chiplet has worse gross margins because it is a physical product. Revenue from Grace is basically pure profit with 98% gross-margin.

Frankly, I own too much Nvidia and cannot buy more. Obviously not selling/trimming as it is against the strategy. I will sell NVDA when a viable competitor emerges (or signs of electrical grid collapsing...) and that is not happening for the next 4-5 years minimum.

ARM and AVGO are my favorite names for my personal diversification push. Buying more Nvidia is probably a better choice for most people.

Expand full comment

Perhaps royalty/core is a little higher and fair point re margin on CSS. Topline upside is more limited though. I'd also consider what happens with R100, rumour has it the design will be 4x reticle limit vs. 2x for Blackwell which likely means that the effective ratio of GPU dies to Grace will double again (as it did Hopper -> Blackwell). Grace content % in revenue and profit terms relative to Nvidia will likely keep shrinking.

Expand full comment
author

I think R100 will be paired with a new Grace. V3/V4 uarch Cortex IP. Too many variables to extrapolate. Only focusing on Blackwell family trends for now.

Expand full comment

hmm just making sure that I follow your math here. Are you basically saying ARM will do $100 * #GB200?

In NVL configurations, isn't the ratio 1 Grace per 2 Blackwell?

Expand full comment
author

Yes by base case is $105.8 per grace. There are two Blackwell packages and one Grace chip per GB200. Two Blackwell die per Blackwell package.

Expand full comment