What if they could cut out the interface FPGAs and implement standard electrical SERDES PHYs on the WSE, with optical transceivers outside - like others are doing. Do you think this would make engineering and economic sense, if on-wafer optics is so difficult?
Good question. I alluded to why they can't use SerDes PHY directly on the WSE but did not explain it clearly. The problem is, left edge of each die has to have the IO PHY. Leads to a lot of wasted area. Essentially, Cerebras is extremely sensitive to IO PHY area because it will get duplicated in-between dies over an over on every left edge. With an optical IO bonded wafer, they could use TSVs and not waste that area.
You mentioned that part of the problem with bonding an Optical PHY is that it is in a different technology node and possibly from a different manufacturer (TSMC vs GloFo). Could they bond an IO PHY wafer with just electrical SERDESes; this new die could be made by TSMC and in the same node? Makes it sense?
A wild idea: if the stepping machine could change the mask on the fly, they could print these SERDES dies directly around the edges of wafer. Or run their wafers twice through a stepper with different mask :-) skyrocketing costs for sure.
I think cooling is going to be a massive problem with any wafer-wafer stacking for Cerebras.
You did not go into detail about how their cooling and power system works but I assume its like Tesla Dojo, power from below and massive cooling plate on top. Now if you have power from below and IO on top how do you cool the damn thing?
What if they could cut out the interface FPGAs and implement standard electrical SERDES PHYs on the WSE, with optical transceivers outside - like others are doing. Do you think this would make engineering and economic sense, if on-wafer optics is so difficult?
Good question. I alluded to why they can't use SerDes PHY directly on the WSE but did not explain it clearly. The problem is, left edge of each die has to have the IO PHY. Leads to a lot of wasted area. Essentially, Cerebras is extremely sensitive to IO PHY area because it will get duplicated in-between dies over an over on every left edge. With an optical IO bonded wafer, they could use TSVs and not waste that area.
You mentioned that part of the problem with bonding an Optical PHY is that it is in a different technology node and possibly from a different manufacturer (TSMC vs GloFo). Could they bond an IO PHY wafer with just electrical SERDESes; this new die could be made by TSMC and in the same node? Makes it sense?
A wild idea: if the stepping machine could change the mask on the fly, they could print these SERDES dies directly around the edges of wafer. Or run their wafers twice through a stepper with different mask :-) skyrocketing costs for sure.
Latest News from cerebras on llama-405b : https://cerebras.ai/blog/llama-405b-inference
I think cooling is going to be a massive problem with any wafer-wafer stacking for Cerebras.
You did not go into detail about how their cooling and power system works but I assume its like Tesla Dojo, power from below and massive cooling plate on top. Now if you have power from below and IO on top how do you cool the damn thing?