Hot Chips 2025: Irrational Recap
Spicy conference.
Irrational Analysis is heavily invested in the semiconductor industry.
Please check the ‘about’ page for a list of active positions.
Positions will change over time and are regularly updated.
Opinions are authors own and do not represent past, present, and/or future employers.
All content published on this newsletter is based on public information and independent research conducted since 2011.
This newsletter is not financial advice and readers should always do their own research before investing in any security.
Feel free to contact me via email at: irrational_analysis@proton.me
Welcome to the second annual Hot Chips Irrational Recap.
This year, there was a lot of spicy drama.
Jar Jar Binks of semiconductor analysis (MESSA) is here to provide differentiated, entertaining, and somewhat retarded coverage.
Messa only cover presentations that were interesting and on topics Messa can write about.
This means zero coverage of the kernel programming section. Messa stupid and can barely script slow shit code in Python and MATLAB. Tri Dao flash attention 4 interesting but Messa understand nothing. Youssa go elsewhere for that kind of coverage.
Grouping presentations by section instead of quality like last year because there is a lot of “cross-pollination” of concepts.
Also sections re-ordered such that topics that interest me are first.
Due to certain events and perceived conflicts of interest, I have decided to withhold some material.
The original draft still exists.
Contents:
<redacted>
Networking
Intel Smart NIC
AMD Smart NIC
Nvidia Fast Smart-ish NIC
Broadcom Switch: Why the fuck does UALink exist?
Machine Learning
Marvell Memory
d-Matrix
Huawei
Google TPU 6p/7 aka Ironwood
Rack-Scale Roundup
Miscellaneous
Intel Clearwater Forrest
Strange RISC-V CPUs
Fully Homeomorphic Encryption RISC-V Accelerator
IBM Power 11
Weird 3D Printed Heatsinks
Useless Keynotes
[2] Networking
Last year, Eric Quinell won the entire conference with his hilarious and highly informative talk on Tesla Transport Protocol for Dojo.
Dojo is now dead so very sad.
[2.a] Intel Smart-NIC
I guess x86 is so irrelevant now that all the infrastructure apps support ARM better.
Ah yes, you have an RDMA engine but shared ZERO performance data. Nice.
[2.b] AMD Smart-NIC
AMD bought Pensando in 2022 for $1.9B and have proceeded to mis-manage it into oblivion. This Pollara NIC is two years behind Nvidia ConntctX, has shit performance, apparently is a semi-custom Broadcom project and thus has shit margins, and nobody wants to use it other than shitco Neoclouds getting fucking debt forgiveness in exchange. LOL
(also Oracle)
Guys you cant just remove the Y-axis completely. I cannot take you jokers seriously when you show bullshit like this.
40% gain can still be shit if the baseline is tiny.
[2.c] Nvidia Smart-ish NIC
The real Nvidia smart NIC is called Bluefield and nobody wants to buy it.
Instead, we have ConnectX.
I find it IMMENSLY amusing Nvidia presented this right after AMD. At least three AMD people asked questions and were basically in shock.
The look on their faces was “oh fuck we suck… their numbers are real?”.
AMD Employee: <asks about test setup>
Nvidia Presenter: <confidently explains test setup>
AMD Employee:
Once again I am reminded of losing money trying to short Astera Labs, the cockroach of semiconductors.
Hey AMD jokers, this is what real performance looks like.
[2.d] Broadcom Switch: Why the fuck does UALink exist?
UALink people like to think that Ethernet has 400 ns of latency and thus they have an intrinsic advantage at around 200 ns.
Broadcom showed off excellent performance data for their Tomahawk 5 Ultra. It is very reasonable that the next spin of the Tomahawk 6 will have this digital logic ported over. So H2 2026 Broadcom will deliver 200G per lane SerDes and all the nice low-latency and collectives features in UEC and SUE while UALink losers struggle to ship anything.
Reduced header support for lower latency.
Broadcom already supports collectives in a high-volume shipping product while the UALink losers continue to debate what to include in their worthless joke of a standard.
Latency is great. What argument does UALink have left?
The 200G SerDes of future (2027 lol) UALink switches will likely need AEC or on-PCB retimers. That immediately causes the latency of a UALink system to be worse than a SUE/UEC Broadcom system.
[3] Machine Learning
[3.a] Marvell Memory
I was going to try and be nice to Marvell for once but their hilarious earnings implosion and legendary earnings call make this very difficult.
On a whim, I threw out a sell short order for 1K shares at $72 limit and it was surprisingly filled. Made more than a months rent in under two hours, closed the short, and was happy.
Missed out on an extra two months of free rent LMAO.
This earnings call is without a doubt the worst one I have ever listened to. The entertainment value is incredible. Do yourself a favor and listen to the audio itself. Transcript does not do it justice.
https://event.choruscall.com/mediaframe/webcast.html?webcastid=Kd2z6aT9
Let’s start with the Hot Chips presentation, literally days before this earnings call.
500 mV SRAM! Super impressive.
Shame for pulling an AMD and not including a properly labeled y-axis.
Also, why are you showing performance at -40C? You should show at 90/110/125C for more realistic operation.
This is exceptional data! Great job.
All PVT corners across a reasonably large sample size.
Interesting that the fast -40C condition seems to be the worst.
Excellent results. Notice how there is some variation in the bathtub curve across lanes.
This Marvell D2D IP puts Synopsys, Cadence and Alphawave UCIe teams to shame. Congratulations to everyone who worked on this.
Allright earnings call time.
Half of the hedge funds already knew Marvell had lost Trainum 3 to Alchip. The other half found out when they saw Marvell’s guidance.
They are consolidating all their half-dead businesses into one segment to obfuscate. Reducing information provided to investors is always a bad thing. I called out AMD when they deleted gaming as a reportable segment and lumped into consumer to obfuscate the ass-whooping from Nvidia/Geforce. What Marvell is doing now is same thing. Hiding bad information.
Translation: Did you lose Trainum 3?
Once again they are asked about Trainium 3 and chose to pivot to XPU attach.
Marvell management appears to not understand that Nvidia Spectrum-XGS competes with Broadcom Jericho. Marvell does not have a product in this category. They are struggling to ship basic Ethernet switches. Forget about the fancy buffered routers that have tons of high-end software features.
Vivek Arya is trying to pin down Marvell and force them to admit they lost Trainium 3.
Matt Murphy chooses to pivot into the half dead carrier and enterprise networking business “recovery”.
Bro if those businesses are doing well why you lumping them all into a single reportable segment?
SORRY I HAVE TO ASK AGAIN. DID YOU LOSE TRAINIUM 3 OR IS THERE A SMALL PAUSE BETWEEN TRAINIUM 2.5 AND 3?
Timothy Arcuri is trying to be clever here. Optics and AI semi-custom are both categorized as AI so he tried to ask about optics revenue growth and the optics baseline to force Marvell to give away info on AI semi-custom.
After some initial bullshit from Matt Murphy, Timmy Arcuri tried to push back and force a real answer.
Instead he got “I don’t have the spreadsheet in front of me”… 🤡
Harsh Kumar attempts to get useful information about the mythical XPU attach revenue that is coming any day now.
He does not get a straight answer.
The tone in which he asked his question is telling.
Also… yea there is a lot of controversy. Alchip says they won your most important socket and you are too spinless to admit this. You knew the guidance was going to finally confirm this. Why are you still trying to hide this development? Stock is down 18% in one day because everybody knows! YOU CANNOT HIDE THE IMPLICATIONS OF THE GUIDANCE NUMBERS.
Harlan is pissed. LOL
Trainium 4 is being dual tracked by Amazon.
One version uses NVLink Fusion, the other version is UALink.
Astera Labs will make the UALink switch if Amazon choses that option. Nvidia (lol) makes Trainium 4 scale-up switch otherwise. Marvell will not get this “XPU attach” scale-up switch socket.
Bridge with Amazon appears to be burnt to a crisp.
Alchip is making the main die for both Trainium 4 tracks.
Also, I know the two “emerging hyperscaler wins” Marvell claims.
I genuinely do not understand why Matt Murphy is still trying to obfuscate. Just say you lost Trn3 socket and pivot to Microsoft/Maia.
Marvell has some pockets of great engineering. Their 64G D2D PHY and dense SRAM are excellent. Marvell SerDes was good but has slipped meaningfully over the last 2 years.
It’s honestly quite sad what is happening. Maybe an activist investor needs to jump in. I hear of rather extreme levels of attrition within Marvell engineering.
In the meantime, super fun trading instrument. Really enjoying day-trade shorting Marvell. It’s a shame Marvell employees legally cannot short their own stock and use the proceeds to buy Broadcom stock instead.
Is it legally possible for Marvell’s board to change the buyback authorization to buy AVGO 0.00%↑ shares instead of buying back MRVL 0.00%↑ shares?
We have random companies becoming Bitcoin and Ethereum treasuries. Why can’t Marvell become a Broadcom stock treasury?
(can one of the hedge fund readers submit this proposal to the Marvell annual meeting vote lmao)
[3.b] d-Matrix
While listening to the d-Matrix presentation, two thoughts came to mind.
First, how is d-Matrix going to compete with upcoming HBM4 base die architectures?
https://semianalysis.com/2025/08/12/scaling-the-memory-wall-the-rise-and-roadmap-of-hbm/
Second, this looks like an SRAM machine (Cerebras, Groq). SRAM is possibly the worst scaling vector in semiconductors right now.

Let’s start with the microarchitecture.
Mentally, I consider “in memory compute” to mean “math inside a DRAM bank.
This is more like “in SRAM compute”. Effectively d-Matrix has designed custom SRAM cells that have compute woven inside. Ultra tight integration of compute and SRAM is more descriptive to what d-Matrix has done.
VLIW == compiler is more complex
At least it is only 4-wide.
What d-Matrix refers to is how the MX standardized numerical formats alongside block FP enable near regular FP accuracy.
Pushed back on this and Sudeep shared a d-Matrix paper and a Microsoft Neurips paper to back up this claim.
https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7fbd2ad9ec03daa3f840-Paper.pdf
https://arxiv.org/pdf/2210.05470
“The Microsoft paper published at Neurips is also good independent validation of block floating point numerical accuracy.
MSFP12 is our [d-Matrix] 4 bit format. MSFP16 is our [d-Matrix] 8 bit format.”
— Sudeep Bhoja, d-Matrix CTO and Co-Founder
So let’s talk about the roadmap because d-Matrix needs to compete with custom base die HBM4.
The core argument d-Matrix was pushing is their in-development stacked/3D DRAM massively wins
d-Matrix Claims:
Their 3D DRAM will be 60% cheaper than HBM4 once the supply chain is ready.
Their 3D DRAM uses 90% less energy for IO transport compared to HBM4.
Claim #1 is believable without much need for thinking. HBM yield is severely harmed by the compounding effect of stacking 12-16 layers. d-Matrix 3D DRAM roadmap only goes up to 4 layers because they are prioritizing energy efficiency and bandwidth over capacity.
Moving on to claim #2…
Sudeep showed me an old Nvidia ISSCC paper that directionally is similar to what d-Matrix is trying to do.
Let’s compare the energy efficiency of HBM4 vs d-Matrix stacked DRAM.
Traditional HBM4 will have 2.5 to 4 pJ/bit system-level efficiency. Cross-checked the numbers they gave me and it seems reasonable.
So the energy efficiency claims in the Hot Chips slides are reasonable given how physically short the datapath is.
One rather obvious issue is the locality of memory. In a traditional HBM-style system, all compute cores can access all DRAM banks.
d-Matrix 3D DRAM cannot do this. Each DMIC core can only access the DRAM banks directly above it. This is a tradeoff between compiler complexity (compiler needs to schedule data movement) and energy efficiency.
Another question is how the supply-chain will adapt to d-Matrix needs. They need customization to the DRAM (LPDDR) die itself. d-Matrix refused to tell me which memory manufacturer is helping them but a quick look at their website suggests…
Anyway, they have real test chips with this tech so the initial hurdle of convincing one of the big 3 DRAM players to help them has been overcome.
Nothing is going to convince me that small language models will be economically viable. Medium-sized language models… sure.
d-Matrix claims their 3D DRAM can achieve the following key metrics over normal HBM4:
60% lower cost once the supply chain is ready.
90% lower system-level energy consumption for data movement.
2X the capacity per layer.
Insane bandwidth advantage of 100X.
I believe these claims are credible. This is enough capacity to run medium-sized language models at incredible throughput and high batch sizes.
Very exciting. It could actually work. Looking forward to seeing more progress from d-Matrix.
[3.c] Huawei
The original presentation was going to talk about the Ascend AI accelerators. At the last minute, the slides were pulled and something completely different was presented.
Also, the Huawei guy apparently joined Q&A from his bed. Background blur was there but I legit think this dude was taking questions while chilling.
Interesting. One proprietary standard to rule them all.
Good explanation of the motivation behind what they are doing.
Huawei is running 800G LPO using 7nm class process technology SerDes. Very impressive.
Western firms are just starting to get 800G LPO to work using much better 3nm class SerDes.
Great explanation of LLR redundancy schemes.
[3.d] Google TPU 6p/7 aka Ironwood
6 connection points (+/- xyz) is important to remember.
4x4x4 cube is electrical only. OCS only hits when you connect multiple 444 cubes.
The SparseCore is very interesting, primarily because of it’s location within the higher level block diagram.
It’s on a separate node within the on-chip ring NoC. Allows for clever functional offload. “In network compute” happens on the main chip network, not the NIC or switch. This is because OCS is an ultra “dumb” switch with little to no computational flexibility.
Very interesting system design.
The presenter verbally described the SDC mitigation strategy in a vague manner. Something about pre-job self-testing and even runtime based mathematical verification.
Sounds like there is some dedicated hidden digital logic that monitors runtime floating-point math.
The compute is VLIW based. Google is the only company in the world that has a good VLIW compiler.
6x8 = 48 lanes of 112G SerDes going across a wide variety of electrical channels. From KR reach PCB channels with multiple connection points to C2M channels to optical modules.
If you think the MediaTek jokers can make their 224G SerDes work at this scale… LOL.
You can see from the PCB that there are a wide variety of channel reaches. Designing one SerDes that can handle all possible channels is EXTREAMLY difficult.
The nonstop rumors from Taiwan rumor mill regarding MediaTek taking TPU are still bullshit. Getting tired of explaining the same concepts over and over.
Maybe Google will make a prefill-style chip to copy Nvidia Rubin CPX
https://semianalysis.com/2025/09/10/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack/
MediaTek could get their 200G working good enough to make a prefill-only chip using the following strategies.
Run the 200G SerDes in 100G mode. (brute force)
Tune the crap out of their 200G to make it work really well with Google’s specific TPUv8 PCB design.
Adjust fixed and floating FFE tap ratio.
Boost CTLE gain at Nyquist and accept the SerDes won’t function at short channel reaches.
Use SiTime chips to PPM sync entire compute blade.
Anyone who is worried about Broadcom losing Google TPU because of fucking MediaTek needs to chill.
I have like $20B+ worth of AVGO stock holders (various institutional investors combined) who keep asking me the same question over and over. Chill guys. Google is playing games because they know Hock has them locked in.
[4] Rack-Scale Roundup
Half a day was dedicated to rack-scale stuff. Some interesting bits from several presentations. Going to lump them all into one section.
Remember the 4x4x4 block internal plumbing is all electrical and needs ultra-high end SerDes that can handle extremely short and extremely long channel reaches.
Managing this many fiber cables in such a small space is apparently very difficult.
All I could think about when seeing this slide is how fucked over Semtech was by ACC rack architecture changes. LOL
Either these cross-rack cables are DAC or the volume of Catalina was way lower than what Semtech was hoping for.
Meta has lots of workloads (particularly recommendation network inference) that thrives off CPU compute for embedding table math and higher DRAM to compute ratio from more LPDDR.
Those copper cables in the middle looks extraordinarily thick. Maybe the Skin effect is what fked Semtech.
These NVLink spine mechanical stabilizers are apparently very difficult to manufacture and required ultra specialized CNC machines.
Ok… so your link budget analysis has so much variation that it is basically useless.
Also reflections reflections reflections. I hate it when novices focus on insertion loss and fail to mention channel complexity/reflections.
We gona need cheaper co-packaged optics that drop the Marvell DSPs and use redundant lasers in OIF ELSFP form factor.
Whoever made this slide does not know what they talking about. All of the power numbers are way too high. For example, passive copper can easily hit 3.5 pJ/bit for long channels and 2.5 pJ/bit for short channels.
Optical transceiver pJ/bit is very easy to calculate.
[5] Miscellaneous
Mostly CPU stuff that most people don’t care about. (I still care…)
[5.a] Intel Clearwater Forrest
The Intel presenter repeatedly mentioned 18A and how it is real, this Clearwater Forest product is real, and will be shipping soon. They even underlined 18A repeatedly to hammer this home.
Someone in Q&A literally asked this dude if 18A is still on the roadmap. On the one hand, this is a rather rude question and people should not troll like this at Hot Chips.
On the other hand, this shit was hilarious. Look at the faces the Intel presenter made.
This is the face of a man who is thinking “what the fuck did you just ask me you little shit?” while trying VERY HARD to remain professional.
Much wider decode and OOO engine.
Ok they just made everything bigger it seems.
17% IPC uplift is nice but uh… that L2 latency looks a bit high?
It was clarified in Q&A that standard (not MRDIMM) DDR5 8000 is natively supported. Nice!
[5.b] Strange RISC-V CPUs
There were a few interesting/exotic RISC-V CPUs presented. Lets start of Andes/Condor.
What if we took the pain and suffering of VLIW style compilers and used that to make a RISC-V CPU?
A compiler efficiently scheduling branchy and dynamic CPU code. HMMMMM WHY DO I DOUBT THIS CLAIM???
Well researched? More like “lots of others have tried and they all dead now.”
The presenter tried to brush off 30% wasted instructions from dependency induced replays as “a reasonable impact”. LOL
Next we have PEZY.
Ok… this looks like a GPU but go on…
No branch predictor or OOO engine.
Interesting. Some internal NoC based compute.
Well that is a very high area allocation to SRAM.
Overall an interesting architecture.
Also they are developing their own HDL to compete with SystemVerilog. Great ambition guys. Best of luck.
[5.c] Fully Homeomorphic Encryption RISC-V Accelerator
This slide explains what FHE does pretty well. Encrypt data such that you can still run (some) math on it while encrypted without messing up decryption. Process data in a totally private manner.
Each desired mathematical operation needs it’s own FHE strategy.
Good overview of the objective for this Presto chip. One design that can run all the FHE strategies.
Three blocks of custom instructions. Interesting that they have some custom load/store instructions. Seems to be for 128-bit operands?
512-dim instructions ARE BACK BABY.
I would have preferred benchmarks against a GPU. Say a weak Nvidia A10…
This is a highly parallel workload that probably competes with GPU acceleration in the real world?
[5.d] IBM Power 11
Pretty sure this is the only example of someone using Samsung Foundry advanced packaging.
This is really cool. IBM has a universal memory interface that can attach to any kind of DRAM across multiple generations. It is SerDes (38.4 Gbps per lane) based and includes significant optimizations down to the CPU core uarch to mitigate latency issues. The presenter said the SerDes is synchronous with the DDR on the buffers so the pipeline is very long yet predictable.
Effectively, the latency penalty after all of these mitigations is only 6-8 nanoseconds. This is an incredible result! Great job everyone who worked on this.
[5.e] Weird 3D Printed Heatsinks
Ever wondered if it was possible to 3D print copper heatsinks?
Well… someone has done this.
This is super cool. Regular finned copper coldplates are limited to straight lines and this causes turbulence issues. Fabric8 has apparently developed a process that can create solid copper coldplates with arbitrary features.
Incredible expansion in the design space for thermal solutions.
Two-phase immersion cooling is mostly dead. The fluids needed are ultra toxic “forever” chemicals that 3M has discontinued.
To say nothing of the maintenance nightmare that is immersion cooling.
Still interesting.
I doubt their claim of “cost competitiveness” with traditional coldplate manufacturing is legit. Still, I can see some people paying a premium to design turbulence-resistant DLC coldplates with extreme flexibility provided by ECAM.
[6] Useless Keynotes
Last year, Trevor Cai (the OpenAI guy) gave an excellent keynote presentation. Very informative, educational, and technical.
This year, we got two useless keynotes that were a complete waste of time.
I was going to pick apart these keynotes but both of the presenters seem like very nice people.



















































![[V]ery [L]ong [I]ncoherent [W]riteup](https://substackcdn.com/image/fetch/$s_!lVhT!,w_140,h_140,c_fill,f_auto,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1335e244-45bb-4add-a677-d9ab1ed74702_875x957.png)















































































How to access your redacted part?
The public part is already spicy. Is redacted part ghost chili spicy???
Jarjar understands! buy avgo sell mrvl. Get ready to short astera but not yet. Gates, logic, stacked, blah blah.... Me use groq with Kimi for work. good for cheap and fast LLM.