2025: The Year in LLMs

>>simonw+(OP)
All these improvement in a single year, 2025. While this may seem obvious to those who follows along the AI / LLM news. It may be worth pointing out again ChatGPT was introduced to us in November 2022.

I still dont believe AGI, ASI or Whatever AI will take over human in short period of time say 10 - 20 years. But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of. People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Being more of a Hardware person, the most interesting part to me is the funding of all the developments of latest hardware. I know this is another topic HN hate because of the DRAM and NAND pricing issue. But it is exciting to see this from a long term view where the pricing are short term pain. Right now the industry is asking, we have together over a trillion dollar to spend on Capex over the next few years and will even borrow more if it needs to be, when can you ship us 16A / 14A / 10A and 8A or 5A, LPDDR6, Higher Capacity DRAM at lower power usage, better packaging, higher speed PCIe or a jump to optical interconnect? Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.

I so wished I brought some Nvidia stock. Again, I guess no one knew AI would be as big as it is today, and it is only just started.

>>ksec+RD
This is not a great argument:

> But it is hard to argue against the value of current AI [...] it is getting $1B dollar runway already.

The psychic services industry makes over $2 billion a year in the US [1], with about a quarter of the population being actual believers. [2].

[1] The https://www.ibisworld.com/united-states/industry/psychic-ser...

[2] https://news.gallup.com/poll/692738/paranormal-phenomena-met...

>>wpietr+T61
2022/2023: "It hallucinates, it's a toy, it's useless."

2024/2025: "Okay, it works, but it produces security vulnerabilities and makes junior devs lazy."

2026 (Current): "It is literally the same thing as a psychic scam."

Can we at least make predictions for 2027? What shall the cope be then! Lemme go ask my psychic.

>>ctoth+mA1
2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?

>>bopbop+KD1
The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.

>>aspenm+n42
Please do provide some data for this "obvious value of coding agents". Because right now the only thing obvious is the increase in vulnerabilities, people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

>>bopbop+F52
Sure: at my MAANG company, where I watch the data closely on adoption of CC and other internal coding agent tools, most (significant) LOC are written by agents, and most employees have adopted coding agents as WAU, and the adoption rate is positively correlated with seniority.

Like a lot of things LLM related (Simon Willison's pelican test, researchers + product leaders implementing AI features) I also heavily "vibe" check the capabilities myself on real work tasks. The fact of the matter is I am able to dramatically speed up my work. It may be actually writing production code + helping me review it, or it may be tasks like: write me a script to diagnose this bug I have, or build me a streamlit dashboard to analyze + visualize this ad hoc data instead of me taking 1 hour to make visualizations + munge data in a notebook.

> people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

what would satisfy you here? I feel you are strawmanning a bit by picking the most hyperbolic statements and then blanketing that on everyone else.

My workflow is now:

- Write code exclusively with Claude

- Review the code myself + use Claude as a sort of review assistant to help me understand decisions about parts of the code I'm confused about

- Provide feedback to Claude to change / steer it away or towards approaches

- Give up when Claude is hopelessly lost

It takes a bit to get the hang of the right balance but in my personal experience (which I doubt you will take seriously but nevertheless): it is quite the game changer and that's coming from someone who would have laughed at the idea of a $200 coding agent subscription 1 year ago

>>aspenm+z72
We probably work at the same company, given you used MAANG instead of FAANG.

As one of the WAU (really DAU) you’re talking about, I want to call out a couple things: 1) the LOC metrics are flawed, and anyone using the agents knows this - eg, ask CC to rewrite the 1 commit you wrote into 5 different commits, now you have 5 100% AI-written commits; 2) total speed up across the entire dev lifecycle is far below 10x, most likely below 2x, but I don’t see any evidence of anyone measuring the counterfactuals to prove speed up anyways, so there’s no clear data; 3) look at token spend for power users, you might be surprised by how many SWE-years they’re spending.

Overall it’s unclear whether LLM-assisted coding is ROI-positive.

>>Denzel+3T2
Oh yes all of this I agree with. I had tried to clarify this above but your examples are clearer: my point is: all measures and studies I have personally seen of AI impact on productivity have been deeply flawed for one reason or another.

Total speed up is WAY less than 10x by any measure. 2x seems too high too.

By data alone it’s a bit unclear of impact I agree. But I will say there seems to be a clear picture that to me, starting from a prior formed from personal experience, indicates some real productivity impact today, with a trajectory that suggests these claims of a lot of SWE work being offloaded to agents over the next few years seems not that far fetched.

- adoption and retention numbers internally and externally. You can argue this is driven by perverse incentives and/or the perception performance mismatch but I’m highly skeptical of this even though the effects of both are probably really, it would be truly extraordinary to me if there weren’t at least a ~10-20% bump in productivity today and a lot of headroom to go as integration gets better and user skill gets better and model capabilities grow

- benchmark performance, again benchmarks are really problematic but there are a lot of them and all of them together paint a pretty clear picture of capabilities truly growing and growing quickly

- there are clearly biases we can think of that would cause us to overestimate AI impact, but there are also biases that may cause us to underestimate impact: e.g. I’m now able to do work that I would have never attempted before. Multitasking is easier. Experiments are quicker and easier. That may not be captured well by e.g. task completion time or other metrics.

I even agree: quality of agentic code can be a real risk, but:

- I think this ignores the fact that humans have also always written shitty code and always will; there is lots of garbage in production believe me, and that predates agentic code

- as models improve, they can correct earlier mistakes

- it’s also a muscle to grow: how to review and use humans in the loop to improve quality and set a high bar

zlacker