E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.
Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.
I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."
Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.
I see it empowering to build custom tooling which need not be a high quality maintenance project.
So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.
Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.
I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
I agree, I write out the sketch of what I want. With a recent embedded project in C I gave it a list of function signatures and high level description and was very satisfied with what it produced. It would have taken me days to nail down the particulars of the HAL (like what kind of sleep do I want what precisely is the way to setup the WDT and ports).
I think it's also language dependent.
I imagine JavaScript can be a crap shoot. The language is too forgiving.
Rust is where I have had most success. That is likely a personal skill issue, I know we want a Arc<DashMap>, will I remember all the foibles of accessing it? No.
But given the rigidity of the compiler and strong typing I can focus on what the code functionally is doing, that in happy with the shape/interface and function signature and the compiler is happy with the code.
It's quite fast work. It lets me use my high level skills without my lower level skills getting in the way.
And id rather rewrite the code at a mid-level then start it fresh, and agree with others once it's a large code base then in too far behind in understanding the overall system to easily work on it. That's true of human products too - someone elses code always gives me the ick.
Contrast this to something you do know but can't be arsed to make; you can keep re-rolling a design until you get something you know and can confirm works. Perfect, time saved.
Using Typescript works great because you can still build out the interfaces and with IDE integrations the AIs can read the language server results so they get all the type hints.
I agree that the AI code is usually a pretty good starting point and gets me up to speed for new features fast rather than starting everything from scratch. I usually end up refactoring the last 10-20% manually to give it some polish because some of the code still feels off some times.
There are some things here that folks making statements like yours often omit and it makes me very sus about your (over)confidence. Mostly these statements talk in a business short-term results oriented mode without mentioning any introspective gains (see empirically supported understanding) or long-term gains (do you feel confident now in making further changes _without_ the AI now that you have gained new knowledge?).
1. Are you 100% sure your code changes didn't introduce unexpected bugs?
1a. If they did, would you be able to tell if they where behaviour bugs (ie. no crashing or exceptions thrown) without the AI?
2. Did you understand why the bug was happening without the AI giving you an explanation?
2a. If you didn't, did you empirically test the AI's explanation before applying the code change?
3. Has fixing the bug improved your understanding of the driver behaviour beyond what the AI told you?
3a. Have you independently verified your gained understanding or did you assume that your new views on its behaviour are axiomatically true?
Ultimately, there are 2 things here: one is understanding the code change (why it is needed, why that particular change implementation is better relative to others, what future improvements could be made to that change implementation in the future) and skill (has this experience boosted your OWN ability in this particular area? in other words, could you make further changes WITHOUT using the AI?).
This reminds me of people that get high and believe they have discovered these amazing truths. Because they FEEL it not because they have actual evidence. When asked to write down these amazing truths while high, all you get in the notes are meaningless words. While these assistants are more amenable to get empirically tested, I don't believe most of the AI hypers (including you in that category) are actually approaching this with the rigour that it entails. It is likely why people often think that none of you (people writing software for a living) are experienced in or qualified to understand and apply scientific principles to build software.
Arguably, AI hypers should lead with data not with anecdotal evidence. For all the grandiose claims, the lack of empirical data obtained under controlled conditions on this particular matter is conspicuous by its absence.
I've been playing with various AI tools and homebrew setups for a long time now and while I see the occasional advantage it isn't nearly as much of a revolution as I've been led to believe by a number of the ardent AI proponents here.
This is starting to get into 'true believer' territory: you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.
AI has served me well, no doubt about that. But it certainly isn't a passe-partout and the number of times it has caused gross waste of time because it insisted on chasing some rabbit simply because it was familiar with the rabbit adds up to a considerable loss in productivity.
The scientific principle is a very powerful tool in such situations and anybody insisting on it should be applauded. It separates fact from fiction and allows us to make impartial and non-emotional evaluations of both theories and technologies.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
I think that's an issue with online discussions. It barely happens to me in the real world, but it's huge on HN.
I'm overall very positive about AI, but I also try to be measured and balanced and learn how to use it properly. Yet here on HN, I always get the feeling people responding to me have decided I am a "true believer" and respond to the true believer persona in their head.
How often have you written code and been 100% your code didn't introduce ANY bugs?
Seriously, for most of the code out there who cares? If it's in a private or even public repo, it doesn't matter.
> Are you 100% sure your code changes didn't introduce unexpected bugs?
Who is this ever? But I do code reviews and I usually generate a bunch of tests along with my PRs (if the project has at lease _some_ test infrastructure).
Same applies for the rest of the points. But that's only _my_ way to do these things. I can imagine that others do it a different way and that the points above are more problematic then.
Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.
Not to be pedantic but, do you _try_ to understand? Or do you _actually_ understand the changes? This suggests to me that there are instances where you don't understand the generated code on projects others than your own, which is literally my point and that of many others. And even if you did understand it, as I pointed out earlier, that's not enough. It is a low bar imo. I will continue to keep my mind open but yours isn't a case study supporting the use of these assistants but the opposite.
In science, when a new idea is brought forward, it gets grilled to no end. The greater the potential the harder the grilling. Software should be no different if the builders want to lay a claim on the name "engineer". It is sad to see a field who claims to apply scientific principles to the development of software not walking the walk.
On complex topics where I know what I'm talking about, model output contains so much garbage with incorrect assumptions.
But complex topics where I'm out of my element, the output always sounds strangely plausible.
This phenomenon writ large is terrifying.
This reminds me of people who get sad when they realize they haven’t discovered anything amazing.
I am pedantic and “people that” → “people who” (for people, who is preferred).
All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.
curious to hear if you are still seeing code degradation over time?
wondering what sort of artifacts beyond ADR/natural language prompts help steer LLMs to do the right thing
Now you don't have to pay a lot of money to get a mediocre solution that works.
All those things that are broken, but you don't have time or money for them, you can have them fixed now.
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.
> if people correct the LLM, the companies can use the session context + feedback to as training.
it definitely seems that way; just the other day coderabbit was asking me where i found x when when it told me x didn't exist... > LLM interaction with customers might become the real learning phase.
sometimes i wonder why i pay for this if i'm supposed to train this thing...