Citation needed. In fact, I think this pretty clearly hits the "extraordinary claims require extraordinary evidence" bar.
Here someone just claimed that it is "entirely clear" LLMs will become super-human, without any evidence.
https://en.wikipedia.org/wiki/Extraordinary_claims_require_e...
The way you've framed it seems like the only evidence you will accept is after it's actually happened.
In my mind, at this point we either need (a) some previously "hidden" super-massive source of training data, or (b) another architectural breakthrough. Without either, this is a game of optimization, and the scaling curves are going to plateau really fast.
a) it hasn't even been a year since the last big breakthrough, the reasoning models like o3 only came out in September, and we don't know how far those will go yet. I'd wait a second before assuming the low-hanging fruit is done.
b) I think coding is a really good environment for agents / reinforcement learning. Rather than requiring a continual supply of new training data, we give the model coding tasks to execute (writing / maintaining / modifying) and then test its code for correctness. We could for example take the entire history of a code-base and just give the model its changing unit + integration tests to implement. My hunch (with no extraordinary evidence) is that this is how coding agents start to nail some of the higher-level abilities.
I think everyone expected AlphaGo to be the research direction to pursue, which is why it was so surprising that LLMs turned out to work.
My friend, we are living in a world of exponential increase of AI capability, at least for the last few years - who knows what the future will bring!
GPT4 was another big improvement, and was the first time I found it useful for non-trivial queries. 4o was nice, and there was decent bump with the reasoning models, especially for coding. However, since o1 it's felt a lot more like optimization than systematic improvement, and I don't see a way for current reasoning models to advance to the point of designing and implementing medium+ coding projects without the assistance of a human.
Like the other commenter mention, I'm sure it will happen eventually with architectural improvements, but I wouldn't bet on 1-5 years.
Last month I had a staff member design and build a distributed system that would be far beyond their capabilities without AI assistance. As a business owner this allows me to reduce the dependency and power of the senior devs.
Don't parrot what you read online that these systems are unable do this stuff. It's from the clueless or devs coping. Not only are they capable but theyre improving by the month.
Does that junior dev take responsibility when that system breaks ?
But.. the capabilities (and rate of progression) of these top tier LLMs isn't hype.
Very soon our AI built software systems will break down in spectacular and never before seen ways, and I'll have the product to help with that.
Secondly, people are not just blindly having AI write code with no idea how it works. The AI is acting as a senior consultant helping the developer to design and build the systems and generating parts of the code as they work together.
Theoretical limitations of multi-layer Transformer https://arxiv.org/abs/2412.02975
Because exponentially growing costs with linear or not measurable improvements is not a great trajectory.
o4 has no problem with the examples of the first paper (appendix A). You can see its reasoning here is also sound: https://chatgpt.com/share/681b468c-3e80-8002-bafe-279bbe9e18.... Not conclusive unfortunately since this is in date-range of its training data. Reasoning models killed off a large class of "easy logic errors" people discovered from the earlier generations though.
They are not reasoning in any real sense, they are writing pages and pages of text before giving you the answer. This is not super-unlike the "ever bigger training data" method, just applied to output instead of input.
Metrics like training data set size are less interesting now given the utility of smaller synthetic data sets.
Once AI tech is more diffused to factory automation, robotics, educational systems, scientific discovery tools, etc., then we could measure efficiency gains.
My personal metric for the next 5 to 10 years: the US national debt and interest payments are perhaps increasing exponentially and since nothing will change politically to change this, exponential AI capability growth will either juice-up productivity enough to save us economically, or it won’t.