But even so, solving that problem feels much more attainable than it used to be.
we'll likely reach a point where it's infeasible for deep learning to completely encompass human-level reasoning, and we'll need neuroscience discoveries to continue progress. altman seems to be hyping up "bigger is better," not just for model parameters but openai's valuation.
I assume that thanks to the universal approximation theorem it’s theoretically possible to emulate the physical mechanism, but at what hardware and training cost? I’ve done back of the napkin math on this before [1] and the number of “parameters” in the brain is at least 2-4 orders of magnitude more than state of the art models. But that’s just the current weights, what about the history that actually enables the plasticity? Channel threshold potentials are also continuous rather than discreet and emulating them might require the full fp64 so I’m not sure how we’re even going to get to the memory requirements in the next decade, let alone whether any architecture on the horizon can emulate neuroplasticity.
Then there’s the whole problem of a true physical feedback loop with which the AI can run experiments to learn against external reward functions and the core survival reward function at the core of evolution might itself be critical but that’s getting deep into the research and philosophy on the nature of intelligence.
[1] >>40313672
If basically a transformer, that means it needs at inference time ~200T flops per token. The paper assumes humans "think" at ~15 tokens/second which is about 10 words, similar to the reading speed of a college graduate. So that would be ~3 petaflops of compute per second.
Assuming that's fp8, an H100 could do ~4 petaflops, and the authors of AI 2027 guesstimate that purpose wafer scale inference chips circa late 2027 should be able to do ~400petaflops for inference, ~100 H100s worth, for ~$600k each for fabrication and installation into a datacenter.
Rounding that basically means ~$6k would buy you the compute to "think" at 10 words/second. Generally speaking that'd probably work out to maybe $3k/yr after depreciation and electricity costs, or ~30-50¢/hr of "human thought equivalent" 10 words/second. Running an AI at 50x human speed 24/7 would cost ~$23k/yr, so 1 OpenBrain researcher's salary could give them a team of ~10-20 such AIs running flat out all the time. Even if you think the AI would need an "extra" 10 or even 100x in terms of tokens/second to match humans, that still puts you at genius level AIs in principle runnable at human speed for 0.1 to 1x the median US income.
There's an open question whether training such a model is feasible in a few years, but the raw compute capability at the chip level to plausibly run a model that large at enormous speed at low cost is already existent (at the street price of B200's it'd cost ~$2-4/hr-human-equivalent).
And I think training is similar — training is capital intensive therefore centralized, but if 100m people are paying $6k for their inference hardware, add on $100/year as a training tax (er, subscription) and you’ve got $10B/year for training operations.
EDIT: holy crap I just discovered a commonly known thing about exponents and log. Leaving comment here but it is wrong, or at least naive.