2025: The Year in LLMs

>>simonw+(OP)
Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

>>didip+Th
Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.

>>virapt+Xr
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.

When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).

>>techpr+fJ
Nice. Did it realize the mistake and corrected it?

>>baq+gU
Nope, I did get a lot of fancy markdown with emojis though so I guess that was a nice tradeoff.

In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.

zlacker