2025: The Year in LLMs

>>simonw+(OP)
Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

>>didip+Th
> I don't understand why Hacker News is so dismissive about the coming of LLMs

I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.

We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.

I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)

>>crysta+fn
I’ve been reading this comment multiple times a week for the last couple years. Constant assertions that we’re starting to hit limits, plateau, etc. But a cursory glance at where we are today vs a year ago, let alone two years ago, makes it wildly obvious that this is bullshit. The pace of improvement of both models and tooling has been breathtaking. I could give a shit whether you think it’s “exponential”, people like you were dismissing all of this years ago, meanwhile I just keep getting more and more productive.

>>senord+SW1
People keep saying stuff like this. That the improvements are so obvious and breathtaking and astronomical and then I go check out the frontier LLMs again and they're maybe a tiny bit better than they were last year but I can't actually be sure bcuz it's hard to tell.

sometimes it seems like people are just living in another timeline.

>>qualif+Gh2
You might want to be more specific because benchmarks abound and they paint a pretty consistent picture. LMArena "vibes" paint another picture. I don't know what you are doing to "check" the frontier LLMs but whatever you're doing doesn't seem to match more careful measurement...

You don't actually have to take peoples word for it, read epoch.ai developments, look into the benchmark literature, look at ARC-AGI...

zlacker