LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.
Also, I'm noticing you're not explaining yourself :)
By what metric?
When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.
> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.
My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?
I think it never did. Still has not.
Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.
Why? Because even the bank teller is doing more than taking and depositing money.
IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.
We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:
https://fred.stlouisfed.org/series/PRS85006092
As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.
Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater. Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-...
- METR task horizon
It's a mix, performance gains are bursty but we have been getting a lot of bursts (RLVR, test-time compute, agentic breakthroughs)