But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.
The most wide-appeal possibility is people loving 100%-AI-slop entertainment like that AI Instagram Reels product. Maybe I'm just too disconnected with normies but I don't see this taking off. Fun as a novelty like those Ring cam vids but I would never spend all day watching AI generated media.
Kagi’s Research Assistant is pretty damn useful, particularly when I can have it poll different models. I remember when the first iPhone lacked copy-paste. This feels similar.
(And I don’t think we’re heading towards AGI.)
Even if you skip ARPAnet, you’re forgetting the Gopher days and even if you jump straight to WWW+email==the internet, you’re forgetting the mosaic days.
The applications that became useful to the masses emerged a decade+ after the public internet and even then, it took 2+ decades to reach anything approaching saturation.
Your dismissal is not likely to age well, for similar reasons.
I know a lot of "normal" people who have completely replaced their search engine with AI. It's increasingly a staple for people.
Smartphones were absolutely NOT immediately useful in a million different ways for almost every person, that's total revisionist history. I remember when the iPhone came out, it was AT&T only, it did almost nothing useful. Smartphones were a novelty for quite a while.
it isn't irrational to act in self-interest. If LLM threatens someone's livelihood, it matters not that it helps humanity overall one bit - they will oppose it. I don't blame them. But i also hope that they cannot succeed in opposing it.
The opposition to AI is from people who feel threatened by it, because it either threatens their livelihood (or family/friends'), and that they feel they are unable to benefit from AI in the same way as they had internet/mobile phones.
I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.
We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
If "immediate" usefulness is the metric we measure, then the internet and smartphones are pretty insignificant inventions compared to LLM.
(of course it's not a meaningful metric, as there is no clear line between a dumb phone and a smart phone, or a moderately sized language model and a LLM)
LLMs have real limitations that aren't going away any time soon - not until we move to a new technology fundamentally different and separate from them - sharing almost nothing in common. There's a lot of 'progress-washing' going on where people claim that these shortfalls will magically disappear if we throw enough data and compute at it when they clearly will not.
Those are some very rosy glasses you've got on there. The nascent Internet took forever to catch on. It was for weird nerds at universities and it'll never catch on, but here we are.
LLMs are being driven mostly by grifters trying to achieve a monopoly before they run out of cash. Under those conditions I find their promises hard to believe. I'll wait until they either go broke or stop losing money left and right, and whatever is left is probably actually useful.
You'll note I don't mention AGI or future model releases in my annual roundup at all. The closest I get to that is expressing doubt that the METR chart will continue at the same rate.
If you focus exclusively on what actually works the LLM space is a whole lot more interesting and less frustrating.
Search, as of today, is inferior to frontier models as a product. However, best case still misses expected returns by miles which is where the growsing comes from.
Generative art/ai is still up in the air for staying power but id predict it isnt going away.
A year after llms came out… are you kidding me?
Two years?
10 years?
Today, by adding an MCP server to wrap the same API that’s been around forever for some system, makes the users of that system prefer NLI over the gui almost immediately.
If you inherit 9000 tests from an existing project you can vibe code a replacement on your phone in a holiday, like Simon Willison's JustHTML port. We are moving from agents semi-randomly flailing around to constraint satisfaction.
BUT when I hear my executive team talk and see demos of "Agentforce" and every saas company becoming an AI company promising the world, I have to roll my eyes.
The challenge I have with LLMs is they are great at creating first draft shiny objects and the LLMs themselves over promise. I am handed half baked work created by non technical people that now I have to clean up. And they don't realize how much work it is to take something from a 60% solution to a 100% solution because it was so easy for them to get to the 60%.
Amazing, game changing tools in the right hands but also give people false confidence.
Not that they are not also useful for non-technical people but I have had to spend a ton of time explaining to copywriters on the marketing team that they shouldn't paste their credentials into the chat even if it tells them to and their vibe coded app is a security nightmare.
Eh. I wouldn’t be so quick to speak for the entirety of HN. Several articles related to LLMs easily hit the front page every single day, so clearly there are plenty of HN users upvoting them.
I think you're just reading too much into what is more likely classic HN cynicism and/or fatigue.
First you need to define what it means. What's the metric? Otherwise it's very much something you can argue about.
LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.
Also, I'm noticing you're not explaining yourself :)
I'd assume that around half of the optimists are emotionally motivated this way.
By what metric?
When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
Lol. It's worse than nothing at all.
The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.
> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.
Yeah, probably. But no chart actually shows it yet. For now we are firmly in exponential zone of the signoid curve and can't really tell if it's going to end in a year, decade or a century.
25 years ago I was optimistic about the internet, web sites, video streaming, online social systems. All of that. Look at what we have now. It was a fun ride until it all ended up “enshitified”. And it will happen to LLMs, too. Fool me once.
Some developer tools might survive in a useful state on subscriptions. But soon enough the whole A.I. economy will centralise into 2 or 3 major players extracting more and more revenue over time until everyone is sick of them. In fact, this process seems to be happening at a pretty high speed.
Once the users are captured, they’ll orient the ad-spend market around themselves. And then they’ll start taking advantage of the advertisers.
I really hope it doesn’t turn out this way. But it’s hard to be optimistic.
The NVIDIA CEO says people should stop learning to code. Now if LLMs will really end up as reliable as compilers, such that they can write code that's better and faster than I can 99% of the time, then he might be right. As things stand now, that reality seems far-fetched. To claim that they're useless because this reality has not yet been achieved would be silly, but not more silly than claiming programming is a dead art.
Language model capability at generating text output.
The model progress this year has been a lot of:
- “We added multimodal”
- “We added a lot of non AI tooling” (ie agents)
- “We put more compute into inference” (ie thinking mode)
So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.
Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.
Thats usually a signal that the rate of progress is slowing.
Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
Do you even remember the releases? Yeah. I dont. I had to look it up.
Just another model with more or less the same capabilities.
“Mixed reception”
That is not what exponential progress looks like, by any measure.
The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.
That may still be on a path to AGI, but it not an exponential path to it.
My own "feeling" is that it's definitely not exponential but again, doesn't matter if it's unsustainable.
My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?
When an "AI skeptic" sees a very positive AI comment, they try to argue that it is indeed interesting but nowhere near close to AI/AGI/ASI or whatever the hype at the moment uses.
When an "AI optimistic" sees a very negative AI comment, they try to list all the amazing things they have done that they were convinced was until then impossible.
Outside the verifiable domains I think the impact is more assistance/augmentation than outright disruption (i.e. a novelty which is still nice). A little tiny bit of value sprinkled over a very large user base but each person deriving little value overall.
Even as they use it as search it is at best an incrementable improvement on what they used to do - not life changing.
That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.
That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.
I think it never did. Still has not.
When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.
> next gen models are significantly harder to build.
That's not how we judge capability progress though.
> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
> Do you even remember the releases?
At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?
> Multimodal add ons that no one asked for
Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.
Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.
Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.
This barrier does not exist for current AI technologies which are being given away free. Minor thought experiment - just how radical would the uptake of mobile phones have been if they were given away free?
https://chrisfrewin.medium.com/why-llms-will-never-be-agi-70...
Seems to be playing out that way.
But most discussion I see is vague and without specificity and without nuance.
Recognising the shortcomings of LLMs makes comments praising LLMs that much more believable; and recognising the benefits of LLMs makes comments criticising LLMs more believable.
I'd completely believe anyone who says they've found the LLM very helpful at greenfield frontend tasks, and I'd believe someone who found the LLM unable to carry out subtle refactors on an old codebase in a language that's not Python or JavaScript.
I'm just a casual user, but I've been doing the same and have noticed the sharp improvements of the models we have now vs a year ago. I have OpenAI Business subscription through work, I signed up for Gemini at home after Gemini 3, and I run local models on my GPU.
I just ask them various questions where I know the answer well, or I can easily verify. Rewrite some code, factual stuff etc. I compare and contrast by asking the same question to different models.
AGI? Hell no. Very useful for some things? Hell yes.
Why? Because even the bank teller is doing more than taking and depositing money.
IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.
We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:
https://fred.stlouisfed.org/series/PRS85006092
As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.
Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater. Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.
In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.
The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.
I can’t point at many problems it has meaningfully solved for me. I mean real problems , not tasks that I have to do for my employer. It seems like it just made parts of my existence more miserable, poisoned many of the things I love, and generally made the future feel a lot less certain.
Autodefenestrate - To eject or hurl oneself from a window, especially lethally
One of them is whether or not large models are useful and/or becoming more useful over time. (To me, clearly the answer is yes)
The other is whether or not they live up to the hype. (To me, clearly the answer is no)
There are other skirmishes around capability for novelty, their role in the economy, their impact on human cognition, if/when AGI might happen and the overall impact to the largely tech-oriented community on HN.
Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.
So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.
How would you put this on a graph?
I have great faith in AI in e.g. medical equipment, or otherwise as something built in, working on a single problem in the background, but the chat interface is terrible.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-...
You may just be a little early to the renaissance. What happens when the models we have today run on a mobile device?
The nokia 6110 was released 15 years after the first commercial cell phone.
The weekend slumps could equally suggest people are using it at work.
> We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
again, if it is "very clear" can you point to some concrete examples to illustrate what you mean?
> I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
OK but what specifically do you have an issue with here?
sometimes it seems like people are just living in another timeline.
Interesting thought about current SOTA models running on my mobile device. I've given it some thought and I don't think it would change my life in any way. Can you suggest some way that it would change yours?
I've been following for many years and the main exponential thing has been the Moore's law like growth in compute. Compute per dollar is probably the best tracking one and has done a steady doubling every couple of years or so for decades. It's exponential but quite a leisurely exponential.
The recent hype of the last couple of years is more dot com bubble like and going ahead of trend but will quite likely drop back.
Here's a graph of internet takeoff with Krugman's famous quote of 1998 that it wouldn't amount to much being maybe the end of the skepticism https://www.contextualize.ai/mpereira/paul-krugmans-poor-pre...
In common with AI there was probably a long period when the hardware wasn't really good enough for it to be useful to most people. I remember 300 baud modems and rubber things to try to connect to your telephone handset back in the 80s.
lol.... Just make sure you screenshot your post so you have a good reminder in a few years re. your predictive ability.
The same line of thinking does not hold with LLMs given their non-deterministic nature. Time will tell where things land.
Education part is on point and as a CS student that sees many of his colleagues using way too much the AI tools for instant homework solving without even processing the answers much.
This shit has gotten worse since 2023.
> Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings. Thats usually a signal that the rate of progress is slowing.
I agree with you on the fact in the first part but not the second part…why would convergence of performance indicate anything about the absolute performance improvements of frontier models?
> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3? Do you even remember the releases? Yeah. I dont. I had to look it up.
3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else
> Just another model with more or less the same capabilities.
5 is absolutely not a model with more or less the same capabilities as gpt 4, what could you mean by this?
> “Mixed reception”
A mixed reception is an indication of model performance against a backdrop of market expectations, not against gpt 4…
> That is not what exponential progress looks like, by any measure.
Sure it is…exponential is a constant % improvement per year. We’re absolutely in that regime by a lot of measures
> The progress this year has been in the tooling around the models, smaller faster
Effective tool use is not somehow some trivial add on it is a core capability for which we are on an exponential progress curve.
> models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.
This is definitely a personal feeling of yours, multimodal models are not something no one asked for…they are absolutely essential. Text data is essential and data curation is non trivial and continually improving, we are also hitting the ceiling of internet text data. But yet we use an incredible amount of synthetic data for RL and this continues to grow……you guessed it, exponentially. and multimodal data is incredibly information rich. Adding multi modality lifts all boats and provides core capabilities necessary for open world reasoning and even better text data (e.g. understanding charts and image context for text).
I really think most everyone misses the actual potential of llms. They aren't an app but an interface.
They are the new UI everyone has known they wanted going back as long as we've had computers. People wanted to talk to the computer and get results.
Think of the people already using them instead of search engines.
To me, and likely you, it doesn't add any value. I can get the same information at about the same speed as before with the same false positives to weed through.
To the person that couldn't use a search engine and filled the internet with easily answered questions before, it's a godsend. They can finally ask the internet in plain ole whatever language they use and get an answer. It can be hard to see, but this is the majority of people on this planet.
LLMs raise the floor of information access. When they become ubiquitous and basically free, people will forget they ever had to use a mouse or hunt for the right pixel to click a button on a tiny mobile device touch screen.
You don't actually have to take peoples word for it, read epoch.ai developments, look into the benchmark literature, look at ARC-AGI...
I would really appreciate it if people could be specific when they say stuff like this because it's so crazy out of line with all measurement efforts. There are an insane amount of serious problems with current LLM / agentic paradigms, but the idea that things have gotten worse since 2023? I mean come on.
- METR task horizon
It's a mix, performance gains are bursty but we have been getting a lot of bursts (RLVR, test-time compute, agentic breakthroughs)
I suppose of you pick a low enough exponent then the exp graph is flat for a long time and you're right, zero progress is “exponential” if you cherry pick your growth rate to be low enough.
Generally though, people understand “exponential growth” as “getting better/bigger faster and faster in an obvious way”
> 3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else
They objectively were not.
The metrics and reception to them was very clear and overwhelming.
Youre spitting some meaningless revisionist BS here.
Youre wrong.
Thats all there is to it.
You don’t understand what an exponential is or apparently what the benchmark numbers even are or possibly even how we actually measure model performance and the very real challenges and nuances involved but yet I’m “spitting some revisionist BS”. You have cited zero sources and are calling measured numbers “revisionist”.
You are also citing reception to models as some sort of indication of their performance, which is yet another confusing part of your reasoning.
I do agree that “metrics were were very clear” it just seems you don’t happen to understand what they are or what they mean.
That's where the skepticism comes in, because one side of the discussion is hyping up exponential growth and the other is seeing something that looks more logarithmic instead.
I realize anecdotes aren't as useful as numbers for this kind of analysis, but there's such a wide gap between what people are observing in practice and what the tests and metrics are showing it's hard not to wonder about those numbers.
I can imagine them generating digital reality on the fly for users - no more dedicated applications, just pure creation on demand ('direct me via turn by turn 3d navigation to x then y and z', 'replay that goal that just was scored and overlay the 3 most recent similar goals scored like that in the bottom right corner of the screen', 'generate me a 3D adventure game to play in the style of zelda, but make it about gnomes').
I suspect the only limitation for a product like this is energy and compute.