zlacker

At no point did I propose a massive block of solid aluminum. I describe the heated surface and I describe a radiating surface, so programmers understand the concept of the balance of energy flow and how to calculate rest temperature with Stefan Boltzmann law, if they want to explore the details they now have enough information to generalize, they can use RMAD and run actual calculations to optimize for different scenarios.

Radiation hardening:

While there is some state information on GPU, for ML applications the occasional bit flip isn't that critical, so Most of the GPU area can be used as efficiently as before and only the critical state information on GPU die or host CPU needs radiation hardening.

Scaling: the didactic unoptimized 30m x 30m x 90m pyramid would train a 405B model 17 days, it would have 23 TB RAM (so it can continue training larger and larger state of the art models at comparatively slower rates). Not sure what's ridiculous about it? At some point people piss on didactic examples because they want somebody to hold their hand and calculate everything for them?