zlacker

I would argue that what LLMs are capable of doing right now is already pretty extraordinary, and would fulfil your extraordinary evidence request. To turn it on its head - given the rather astonishing success of the recent LLM training approaches, what evidence do you have that these models are going to plateau short of your own abilities?

replies(4): >>sigmai+A >>gmm199+wg >>sampul+Sq >>namari+L31

>>sweezy+(OP)
What they do is extraordinary, but it's not just a claim, they actually do, their doing so is evidence.

Here someone just claimed that it is "entirely clear" LLMs will become super-human, without any evidence.

https://en.wikipedia.org/wiki/Extraordinary_claims_require_e...

replies(1): >>sweezy+m1

>>sigmai+A
Again - I'd argue that the extraordinary success of LLMs, in a relatively short amount of time, using a fairly unsophisticated training approach, is strong evidence that coding models are going to get a lot better than they are right now. Will it definitely surpass every human? I don't know, but I wouldn't say we're lacking extraordinary evidence for that claim either.

The way you've framed it seems like the only evidence you will accept is after it's actually happened.

replies(2): >>sigmai+M1 >>davidc+B8

>>sweezy+m1
Well, predicting the future is always hard. But if someone claims some extraordinary future event is going to happen, you at least ask for their reasons for claiming so, don't you.

In my mind, at this point we either need (a) some previously "hidden" super-massive source of training data, or (b) another architectural breakthrough. Without either, this is a game of optimization, and the scaling curves are going to plateau really fast.

replies(1): >>sweezy+U3

>>sigmai+M1
A couple of comments

a) it hasn't even been a year since the last big breakthrough, the reasoning models like o3 only came out in September, and we don't know how far those will go yet. I'd wait a second before assuming the low-hanging fruit is done.

b) I think coding is a really good environment for agents / reinforcement learning. Rather than requiring a continual supply of new training data, we give the model coding tasks to execute (writing / maintaining / modifying) and then test its code for correctness. We could for example take the entire history of a code-base and just give the model its changing unit + integration tests to implement. My hunch (with no extraordinary evidence) is that this is how coding agents start to nail some of the higher-level abilities.

replies(1): >>sigmai+6o1

>>sweezy+m1
This is like Disco Stu's chart for disco sales on the Simpsons or the people who were guaranteeing bitcoin would be $1 million each in 2020

replies(1): >>sweezy+OR

>>sweezy+(OP)
I think it’s glorified copying of existing libraries/code. The number of resources already dedicated to the field and the amount of hype around the technology make me wary that it will get better at more comprehensive code design.

>>sweezy+(OP)
I agree that they can do extraordinary things already, but have a different impression of the trajectory. I don't think it's possible for me to provide hard evidence, but between GPT2 and 3.5 I felt that there was an incredible improvement, and probably would have agreed with you at that time.

GPT4 was another big improvement, and was the first time I found it useful for non-trivial queries. 4o was nice, and there was decent bump with the reasoning models, especially for coding. However, since o1 it's felt a lot more like optimization than systematic improvement, and I don't see a way for current reasoning models to advance to the point of designing and implementing medium+ coding projects without the assistance of a human.

Like the other commenter mention, I'm sure it will happen eventually with architectural improvements, but I wouldn't bet on 1-5 years.

>>davidc+B8
I'm not betting any money here - extrapolation is always hard. But just drawing a mental line from here that tapers to somewhere below one's own abilities - I'm not seeing a lot of justification for that either.

>>sweezy+(OP)
On Limitations of the Transformer Architecture https://arxiv.org/abs/2402.08164

Theoretical limitations of multi-layer Transformer https://arxiv.org/abs/2412.02975

replies(1): >>sweezy+K51

>>namari+L31
Only skimmed, but both seem to be referring to what transformers can do in a single forward pass, reasoning models would clearly be a way around that limitation.

o4 has no problem with the examples of the first paper (appendix A). You can see its reasoning here is also sound: https://chatgpt.com/share/681b468c-3e80-8002-bafe-279bbe9e18.... Not conclusive unfortunately since this is in date-range of its training data. Reasoning models killed off a large class of "easy logic errors" people discovered from the earlier generations though.

replies(1): >>namari+e4d

>>sweezy+U3
the "reasoning" models are already optimization, not a breakthrough.

They are not reasoning in any real sense, they are writing pages and pages of text before giving you the answer. This is not super-unlike the "ever bigger training data" method, just applied to output instead of input.

>>sweezy+K51
Your unwillingness to engage with the limitations of the technology explains a lot of the current hype.