zlacker

Let me avoid the use of the word AGI here because the term is a little too loaded for me these days.

1) reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute.

2) intelligence at a certain level is easier to achieve algorithmically when the hardware improves. There's also a larger path to intelligence and often simpler mechanisms

3) most current generation reasoning AI models leverage test time compute and RL in training--both of which can make use of more compute readily. For example RL on coding against compilers proofs against verifiers.

All of this points to compute now being basically the only bottleneck to massively superhuman AIs in domains like math and coding--rest no comment (idk what superhuman is in a domain with no objective evals)

replies(5): >>philip+Z2 >>lossol+wc >>rhubar+uG >>viccis+SP1 >>sgt101+xw2

>>Davidz+(OP)
You can't block AGI on a whim and then deploy 'superhuman' without justification.

A calculator is superhuman if you're prepared to put up with it's foibles.

replies(1): >>Davidz+H6

>>philip+Z2
It is superhuman in a very specific domain. I didn't use AGI because its definitions are one of two flavors.

One, capable of replacing some large proportion of global gdp (this definition has a lot of obstructions: organizational, bureaucratic, robotic)...

two, difficult to find problems in which average human can solve but model cannot. The problem with this definition is that the distinct nature of intelligence of AI and the broadness of tasks is such that this metric is probably only achievable after AI is already in reality massively superhuman intelligence in aggregate. Compare this with Go AIs which were massively superhuman and often still failing to count ladders correctly--which was also fixed by more scaling.

All in all I avoid the term AGI because for me AGI is comparing average intelligence on broad tasks rel humans and I'm already not sure if it's achieved by current models whereas superhuman research math is clearly not achieved because humans are still making all of progress of new results.

>>Davidz+(OP)
> All of this points to compute now being basically the only bottleneck to massively superhuman AIs

This is true for brute force algorithms as well and has been known for decades. With infinite compute, you can achieve wonders. But the problem lies in diminishing returns[1][2], and it seems things do not scale linearly, at least for transformers.

1. https://www.bloomberg.com/news/articles/2024-12-19/anthropic...

2. https://www.bloomberg.com/news/articles/2024-11-13/openai-go...

>>Davidz+(OP)
> 1) reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute.

What would you say is the strongest evidence for this statement?

replies(1): >>__loam+BP

>>rhubar+uG
Well the contrived benchmarks the industry selling the models made up seem to be improving.

replies(1): >>drdaem+522

>>Davidz+(OP)
>reasoning capabilities in latest models are rapidly approaching superhuman levels and continue to scale with compute

I still have a pretty hard time getting it to tell me how many sisters Alice has. I think this might be a bit optimistic.

replies(1): >>Sketch+AA2

>>__loam+BP
Well, it's a huge jump, but it's still a jump from "it generates utter illogical nonsense when it tries to simulate reason" to "it makes some correct guesses that start to resemble reasoning if we squint at it really hard."

Which is - no doubt - an astonishing achievement, but absolutely not like the "AI" hype train people try to paint it.

The "rapidly approaching" part is true in terms of the velocity, but all of this are just baby steps while walking upright properly is way beyond the horizon.

I wouldn't mind being wrong about this, of course.

>>Davidz+(OP)
What is the evidence for 1) ? I thought that the latest models were getting "somewhere" with fairly trivial reasoning tests like ARC-1

replies(1): >>sealec+3pe

>>viccis+SP1
They plugged the hole for "how many 'r''s in 'strawberry'", but I just asked it how many "l"s in "lemolade" (spelling intentional) and it told me 1. If you make it close to, but not exactly a word it would be expecting it falls over.

replies(1): >>tawm+HK2

>>Sketch+AA2
I wonder if those special cases are handled by a bunch of if/else statements wrapped around the model :)

replies(1): >>Sketch+pL2

>>tawm+HK2
That's my wonder too.

>>sgt101+xw2
It may be that you can just find the solution for these tests by interpolating from a very large dataset.