zlacker

[return to "2025: The Year in LLMs"]
1. didip+Th[view] [source] 2026-01-01 02:38:52
>>simonw+(OP)
Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

◧◩
2. virapt+Xr[view] [source] 2026-01-01 04:44:07
>>didip+Th
Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.
◧◩◪
3. techpr+fJ[view] [source] 2026-01-01 08:56:08
>>virapt+Xr
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.

When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).

◧◩◪◨
4. virapt+pL[view] [source] 2026-01-01 09:21:22
>>techpr+fJ
Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)

Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.

◧◩◪◨⬒
5. techpr+2W[view] [source] 2026-01-01 11:24:11
>>virapt+pL
It’s not about being perfect, it’s about not being as great as the marketing, and many proponents, claim.

The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.

[go to top]