zlacker

Radiators can shadow each other, so that puts some kind of limit on the size of the individual satellite (which limits the size of training run it can be used for, but I guess the goal for these is mostly inference anyway). More seriously, heat conduction is an issue: If the radiator is too long, heat won't get from its base to its tip fast enough. Using fluid is possible, but adds another system that can fail. If nothing else, increasing the size of the radiator means more mass that needs to be launched into space.

replies(1): >>Doctor+SA

>>c1cccc+(OP)
please check my didactic example here: >>46862869

"Radiators can shadow each other," this is precisely why I chose a convex shape, that was not an accident, I chose a pyramid just because its obvious that the 4 triangular sides can be kept in the shade with respect to the sun, and their area can be made arbitrarily large by increasing the height of the pyramid for a constant base. A convex shape guarantees that no part of the surface can appear in the hemispherical view of any other part of the surface.

The only size limit is technological / economical.

In practice h = 3xL where L was the square base side length, suffices to keep the temperature below 300K.

If heat conduction can't be managed with thermosiphons / heat pipes / cooling loops on the satellite, why would it be possible on earth? Think of a small scale satellite with pyramidal sats roughly h = 3L, but L could be much smaller, do you actually see any issue with heat conduction? scaling up just means placing more of the small pyramidal sats.

replies(1): >>c1cccc+WC3

>>Doctor+SA
Kudos for giving a concrete example, but the square-cube law means that scaling area A results in A^(3/2) scaling for the mass of material used and also launch costs. If you make the pyramid hollow to avoid this, you're back to having to worry about heat conduction. You assumed an infinite thermal conductivity for your pyramid material, a good approximation if it's solid aluminum, but that's going to be very expensive (mainly in launch costs).

In reality, probably radiator designs would rely on fluid cooling to move heat all the way along the radiator, rather than thermal conduction. This prevents the above problem. The issue there is that we now need to design this system with its pipes and pumps in such a way that it can run reliably for years with zero maintenance. Doable? Yes. Easy or cheap? No. The reason cooling on Earth is easier is that we can transfer heat to air / water instead of having to radiate it away ourselves. Doing this basically allows us to use the entire surface of the planet as our radiator. But this is not an option in space, where we need to supply the radiator ourselves.

In terms of scaling by instead making many very small sats, I agree that this will scale well from a cooling perspective as long as you keep them far enough apart from each other. This is not as great from the perspective of many things we actually want to use a compute cluster for, which require high-bandwidth communication between GPUs.

In any case, another very big problem is the fact that space has a lot of ionizing radiation in it, which means we also have to add a lot of radiation shielding too.

Keep in mind that the on-the-ground alternative that all this extra fooling around has to compete with is just using more solar panels and making some batteries.

replies(1): >>Doctor+KL5

>>c1cccc+WC3
At no point did I propose a massive block of solid aluminum. I describe the heated surface and I describe a radiating surface, so programmers understand the concept of the balance of energy flow and how to calculate rest temperature with Stefan Boltzmann law, if they want to explore the details they now have enough information to generalize, they can use RMAD and run actual calculations to optimize for different scenarios.

Radiation hardening:

While there is some state information on GPU, for ML applications the occasional bit flip isn't that critical, so Most of the GPU area can be used as efficiently as before and only the critical state information on GPU die or host CPU needs radiation hardening.

Scaling: the didactic unoptimized 30m x 30m x 90m pyramid would train a 405B model 17 days, it would have 23 TB RAM (so it can continue training larger and larger state of the art models at comparatively slower rates). Not sure what's ridiculous about it? At some point people piss on didactic examples because they want somebody to hold their hand and calculate everything for them?