https://www.kaggle.com/competitions/konwinski-prize/
Currently, the #1 spot sits at a score of 0.09, not 0.9. A far cry from being useful. I know that open source models are not as good as closed source, but still, we're a long way from LLMs being good for code on their own.
And that supports OP's point - these tools aren't AGI, they produce trash that needs evaluation, but they're still useful.
The best intellisense and code completion tools would solve 0.00. Those were the only tools we were using just a couple of years ago. 0.09 is a tremendous jump and the improvements will accelerate!
Do you think humans have achieved peak intelligence? If so, why, and if not, then why shouldn't you expect artificial forms of intelligence to improve up to and even surpass humans abilities at some point?
Edit: to clarify, I'm not necessarily assuming unbounded acceleration, but tools always start out middling, improvements accelerate as we figure out what works and what doesn't, and then they taper off. We're just starting on the acceleration curve for AI.
We are quite far into the development cycle of LLMs. Literally billions of dollars have been poured into them. The rate of improvements over the last 6-12 months has slowed, not accelerated.
There hasn’t been any hint on AGI breakthroughs, so we’re dealing with the tools to help herd stochastic parrots (i.e. agents) for the foreseeable future. And those tools are to just help with how much LLMs hallucinate, it doesn’t make them more creative in a way to improve these scores.
No, we've barely scratched the surface. Billions of dollars have been poured into the stupidest possible thing that could work + scaling, and we're only now trying more clever things. Fine-tuning on specific tasks will yield considerable productivity benefits in those domains.
I'm not only skeptical of your claim on the "rate of improvements over the last 6-12 months", but it's not even a compelling time horizon to infer any kind of trend at this stage.