Discussion about this post

User's avatar
Peter W.'s avatar

First, thanks for being so stubbornly principled - I respect your analyses, enjoy reading your take, and also learn a thing or two from reading your articles!

Second, regarding the SRAM cache and Groq: one company that has significant experience with adding SRAM cache on top or under compute dies is of course AMD, with lots of help from TSMC. However, that X3D Cache (added to L3) currently maxes out at 64 MB. So, unless about twice that (in-chip L3 plus X3D) is enough to really make a big difference, it'll still boil down to HBM or, maybe, GDDR7. That all being said, maybe someone here knows: are there are any useful LLM distills that could be run entirely resident in ~ 128 MB L3?

Speculation on my part: whatever Nvidia "Groqs" up now is mostly for client-side AI, not these AI accelerator farms. I believe there is significant interest in on-prem inferencing, especially if the IP involved is sensitive. With the right LLM distills fine-tuned to give plenty of tokens, there might just be a market for something significantly bigger than a bunch of NGX Sparks but smaller than racks full of B300s, and also really good at inferencing with sparsity.

chrisgear's avatar

So, in sum:

1. Long NVDA, Intel, Tower Semi, Sitime

2. Short ShitComm

3. Long CPUs

4. AAOI needs a earnings growth valuation

24 more comments...

No posts

Ready for more?