2025: The Year in LLMs

>>simonw+(OP)
Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

>>didip+Th
> I don't understand why Hacker News is so dismissive about the coming of LLMs

I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.

We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.

I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)

>>crysta+fn
> exponential progress

First you need to define what it means. What's the metric? Otherwise it's very much something you can argue about.

>>virapt+Us
> What's the metric?

Language model capability at generating text output.

The model progress this year has been a lot of:

- “We added multimodal”

- “We added a lot of non AI tooling” (ie agents)

- “We put more compute into inference” (ie thinking mode)

So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.

Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.

Thats usually a signal that the rate of progress is slowing.

Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

Do you even remember the releases? Yeah. I dont. I had to look it up.

Just another model with more or less the same capabilities.

“Mixed reception”

That is not what exponential progress looks like, by any measure.

The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

That may still be on a path to AGI, but it not an exponential path to it.

>>noodle+NF
Next gen models are always hard to build, they are by definition pushing the frontier. Every generation of CPU was hard to build but we still had Moores law.

> Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings. Thats usually a signal that the rate of progress is slowing.

I agree with you on the fact in the first part but not the second part…why would convergence of performance indicate anything about the absolute performance improvements of frontier models?

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3? Do you even remember the releases? Yeah. I dont. I had to look it up.

3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else

> Just another model with more or less the same capabilities.

5 is absolutely not a model with more or less the same capabilities as gpt 4, what could you mean by this?

> “Mixed reception”

A mixed reception is an indication of model performance against a backdrop of market expectations, not against gpt 4…

> That is not what exponential progress looks like, by any measure.

Sure it is…exponential is a constant % improvement per year. We’re absolutely in that regime by a lot of measures

> The progress this year has been in the tooling around the models, smaller faster

Effective tool use is not somehow some trivial add on it is a core capability for which we are on an exponential progress curve.

> models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

This is definitely a personal feeling of yours, multimodal models are not something no one asked for…they are absolutely essential. Text data is essential and data curation is non trivial and continually improving, we are also hitting the ceiling of internet text data. But yet we use an incredible amount of synthetic data for RL and this continues to grow……you guessed it, exponentially. and multimodal data is incredibly information rich. Adding multi modality lifts all boats and provides core capabilities necessary for open world reasoning and even better text data (e.g. understanding charts and image context for text).

zlacker