I work at a publicly traded semiconductor company. If they find out about my hobby (no money involved) I could probably talk my way out of the situation as the disclaimers at the top of each post are compatible with the employee handbook. If I charge money, 100% fired lol. Please share though. Appreciate it.
Read what your actual employment terms are. Not saying you are allowed to but at many companies you are. I would be allowed to at mine although I wouldn’t have assumed so.
The main problem often times is getting the compiler to "find" optimal orderings of operations. Do you have any opinions on search-based compilation? We haven't seen any good compilers come out yet (tinygrad and luminal are working toward this), but in theory you can just throw compute at the problem to search through many different equivalent computations and profile them to find the fastest one. Using techniques like e-graphs (https://egraphs-good.github.io) you can manage a large search space, and using techniques like mu-zero / MCTS you can prune the search space tremendously. In theory that would allow the compiler to be way simpler and much more powerful, albiet with very slow compilation times.
Notably this can't work on turing-complete workloads (Itanium) because of rice's theorem (https://en.wikipedia.org/wiki/Rice%27s_theorem) but in deep learning, the space of possible computations is much more constrained. Just linear algebra.
I think you're missing the important point that AI has far simpler control flow than almost everything else. It's almost entirely matrix multiplications and simple elementwise/reduction operations. It's much easier to cover that than to cover everything everyone might want to run on a DSP.
I really don't have a dog in the VLIW fight but … in 2024 we've solved the hardest expert problem (I think) that we've ever posed, beating a 9-dan go player at the very simple game of go (small number of simple rules). So I'm surprised we can't generate good VLIW code, well at least useful enough to take advantage of VLIW's considerable strengths.
The Yale Bulldog VLIW compiler for the ELI-512 was a significant advancement in compilers with its trace scheduling. The commercial version of the compiler at Multiflow was even licensed to a bunch of companies including Intel and HP. Monica Lam's software pipelining at HP was a continuation along that line. A lot of great compiler ideas got their start in VLIW research.
I've read that Groq has punted on the idea of a general purpose VLIW AI machine programmable by mortals with opposable thumbs and switched over to just being a model service bureau, groqcloud. This strikes me as a good humble idea.
really enjoy your writing style and memes! will spend the weekend reading all your posts 🤓
Memes are the DNA of the soul.
great write up, would even be worth $20 per month
I work at a publicly traded semiconductor company. If they find out about my hobby (no money involved) I could probably talk my way out of the situation as the disclaimers at the top of each post are compatible with the employee handbook. If I charge money, 100% fired lol. Please share though. Appreciate it.
Read what your actual employment terms are. Not saying you are allowed to but at many companies you are. I would be allowed to at mine although I wouldn’t have assumed so.
robinson Crusoe!
The main problem often times is getting the compiler to "find" optimal orderings of operations. Do you have any opinions on search-based compilation? We haven't seen any good compilers come out yet (tinygrad and luminal are working toward this), but in theory you can just throw compute at the problem to search through many different equivalent computations and profile them to find the fastest one. Using techniques like e-graphs (https://egraphs-good.github.io) you can manage a large search space, and using techniques like mu-zero / MCTS you can prune the search space tremendously. In theory that would allow the compiler to be way simpler and much more powerful, albiet with very slow compilation times.
Notably this can't work on turing-complete workloads (Itanium) because of rice's theorem (https://en.wikipedia.org/wiki/Rice%27s_theorem) but in deep learning, the space of possible computations is much more constrained. Just linear algebra.
I think you're missing the important point that AI has far simpler control flow than almost everything else. It's almost entirely matrix multiplications and simple elementwise/reduction operations. It's much easier to cover that than to cover everything everyone might want to run on a DSP.
Great writing.
I really don't have a dog in the VLIW fight but … in 2024 we've solved the hardest expert problem (I think) that we've ever posed, beating a 9-dan go player at the very simple game of go (small number of simple rules). So I'm surprised we can't generate good VLIW code, well at least useful enough to take advantage of VLIW's considerable strengths.
The Yale Bulldog VLIW compiler for the ELI-512 was a significant advancement in compilers with its trace scheduling. The commercial version of the compiler at Multiflow was even licensed to a bunch of companies including Intel and HP. Monica Lam's software pipelining at HP was a continuation along that line. A lot of great compiler ideas got their start in VLIW research.
I've read that Groq has punted on the idea of a general purpose VLIW AI machine programmable by mortals with opposable thumbs and switched over to just being a model service bureau, groqcloud. This strikes me as a good humble idea.
https://www.eetimes.com/groq-ceo-we-no-longer-sell-hardware/
BTW, Qualcomm has an in-tree LLVM backend for their Hexagon. It's pretty good.
Great post, a refreshing contrarian take to all Groq hype all there. Btw, PPAP will always be pen pineapple apple pen to me lol https://youtu.be/Ct6BUPvE2sM?si=zaMbcPUUzytDCDoG