zlacker

For me, AI is an enabler for things you can't do otherwise (or that would take many weeks of learning). But you still need to know how to do things properly in general, otherwise the results are bad.

E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.

Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.

replies(10): >>joshbe+t3 >>mirsad+S3 >>varjag+O4 >>ivell+f5 >>trcf23+d7 >>virapt+j7 >>bonobo+le >>kilnin+5m >>netdev+ko >>bandra+0q

>>micw+(OP)
I'm in the same boat. I've been taking on much more ambitious projects both at work and personally by collaborating with LLMs. There are many tasks that I know I could do myself but would require a ton of trial and error.

I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."

Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.

replies(2): >>rustyh+Ek >>jinhku+WG3

>>micw+(OP)
I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.

replies(4): >>physic+Z3 >>ozlike+y5 >>xgb84j+m8 >>Sammi+1d4

>>mirsad+S3
I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

replies(2): >>embedd+Or >>int_19+B27

>>micw+(OP)
I think what we'll see as AI companies collect more usage data the requirements for knowing what you do will sink lower and lower. Whatever advantage we have now is transient.

>>micw+(OP)
In my case I built a video editing tool fully customized for a community of which I am a member. I could do it in a few hours. I wouldn't have even started this project as I don't have much free time, though I have been coding for 25+ years.

I see it empowering to build custom tooling which need not be a high quality maintenance project.

>>mirsad+S3
Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.

replies(4): >>trcf23+l7 >>except+H7 >>CatMus+38 >>Perz1v+La

>>micw+(OP)
Also most of the studies shown start to be obsolete with AI rapid path of improvements. Opus 4.5 has been a huge game changer for me (combined with CC that I had not used before) since December. Claude code arrived this summer if I’m not mistaken.

So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.

replies(1): >>jacomo+4h

>>micw+(OP)
> But you still need to know how to do things properly in general, otherwise the results are bad.

Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.

replies(1): >>samusi+y33

>>ozlike+y5
Or it takes a lot of time effort and intelligence to produce good code and IA is not there yet…

>>ozlike+y5
I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.

I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.

LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.

replies(1): >>andrek+ooa

>>ozlike+y5
This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.

Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.

No clue has any research been done into this, just a thought OTTOMH.

>>mirsad+S3
Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.

>>ozlike+y5
It is a mathematical, averaging model after all

>>micw+(OP)
Yes but in my experience this sometimes works great, other times you paint yourself in a corner and the sun total is that you still have to learn the thing, just the initial ram is less steep. For example I build my self a nice pipeline for converting jpegs on disk to h264 on disk via zero-copy nvjpeg to nvenc, with python bindings but have been pulling out my hair over bframe ordering and weird delays in playback etc. Nothing u solvable but I had to learn a great deal and when we were in the weeds, Opus was suggesting stupid hack quick fixes that made a whack a mole with the tests. In the end I had to lead e Pugh and read enough to be able to ask it with the right vocabulary to make it work. Similarly with entering many novel areas. Initially I get a rush because it "just works" but it really only works for the median case initially and it's up to you to even know what to test. And AIs can be quite dismissive of edge cases like saying this will not happen in most cases so we can skip it etc.

replies(1): >>embedd+gR

>>trcf23+d7
Agreed, this space move so fast, 2024 feels like light-years away in terms of capabilities.

>>joshbe+t3
100% agree with AI expanding core testing from my own edge and key tests.

I agree, I write out the sketch of what I want. With a recent embedded project in C I gave it a list of function signatures and high level description and was very satisfied with what it produced. It would have taken me days to nail down the particulars of the HAL (like what kind of sleep do I want what precisely is the way to setup the WDT and ports).

I think it's also language dependent.

I imagine JavaScript can be a crap shoot. The language is too forgiving.

Rust is where I have had most success. That is likely a personal skill issue, I know we want a Arc<DashMap>, will I remember all the foibles of accessing it? No.

But given the rigidity of the compiler and strong typing I can focus on what the code functionally is doing, that in happy with the shape/interface and function signature and the compiler is happy with the code.

It's quite fast work. It lets me use my high level skills without my lower level skills getting in the way.

And id rather rewrite the code at a mid-level then start it fresh, and agree with others once it's a large code base then in too far behind in understanding the overall system to easily work on it. That's true of human products too - someone elses code always gives me the ick.

replies(1): >>joshbe+zm

>>micw+(OP)
I've found this is exact opposite of what I'd dare do with AI, things you don't understand are things you can't verify. Consider you want a windowed pane for your cool project, so you ask an AI to draft a design. It looks cool and it works! Until you bring it outside where after 30 minutes it turns into explosive shrapnel, because the model didn't understand thermal expansion, nor did you.

Contrast this to something you do know but can't be arsed to make; you can keep re-rolling a design until you get something you know and can confirm works. Perfect, time saved.

>>rustyh+Ek
Vanilla javascript is hit or miss for anything complex.

Using Typescript works great because you can still build out the interfaces and with IDE integrations the AIs can read the language server results so they get all the type hints.

I agree that the AI code is usually a pretty good starting point and gets me up to speed for new features fast rather than starting everything from scratch. I usually end up refactoring the last 10-20% manually to give it some polish because some of the code still feels off some times.

>>micw+(OP)
> Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI

There are some things here that folks making statements like yours often omit and it makes me very sus about your (over)confidence. Mostly these statements talk in a business short-term results oriented mode without mentioning any introspective gains (see empirically supported understanding) or long-term gains (do you feel confident now in making further changes _without_ the AI now that you have gained new knowledge?).

1. Are you 100% sure your code changes didn't introduce unexpected bugs?

1a. If they did, would you be able to tell if they where behaviour bugs (ie. no crashing or exceptions thrown) without the AI?

2. Did you understand why the bug was happening without the AI giving you an explanation?

2a. If you didn't, did you empirically test the AI's explanation before applying the code change?

3. Has fixing the bug improved your understanding of the driver behaviour beyond what the AI told you?

3a. Have you independently verified your gained understanding or did you assume that your new views on its behaviour are axiomatically true?

Ultimately, there are 2 things here: one is understanding the code change (why it is needed, why that particular change implementation is better relative to others, what future improvements could be made to that change implementation in the future) and skill (has this experience boosted your OWN ability in this particular area? in other words, could you make further changes WITHOUT using the AI?).

This reminds me of people that get high and believe they have discovered these amazing truths. Because they FEEL it not because they have actual evidence. When asked to write down these amazing truths while high, all you get in the notes are meaningless words. While these assistants are more amenable to get empirically tested, I don't believe most of the AI hypers (including you in that category) are actually approaching this with the rigour that it entails. It is likely why people often think that none of you (people writing software for a living) are experienced in or qualified to understand and apply scientific principles to build software.

Arguably, AI hypers should lead with data not with anecdotal evidence. For all the grandiose claims, the lack of empirical data obtained under controlled conditions on this particular matter is conspicuous by its absence.

replies(5): >>KptMar+No >>jacque+ip >>mlrtim+yy >>micw+E11 >>aaaasm+LV2

>>netdev+ko
Why would you ever, outside flight and medical software, care about being 100% sure that the change did not introduce any bugs?

replies(2): >>jacque+np >>bandra+Jp

>>netdev+ko
It's incredible that within two minutes after posting this comment is already grayed out whereas it makes a number of excellent points.

I've been playing with various AI tools and homebrew setups for a long time now and while I see the occasional advantage it isn't nearly as much of a revolution as I've been led to believe by a number of the ardent AI proponents here.

This is starting to get into 'true believer' territory: you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.

AI has served me well, no doubt about that. But it certainly isn't a passe-partout and the number of times it has caused gross waste of time because it insisted on chasing some rabbit simply because it was familiar with the rabbit adds up to a considerable loss in productivity.

The scientific principle is a very powerful tool in such situations and anybody insisting on it should be applauded. It separates fact from fiction and allows us to make impartial and non-emotional evaluations of both theories and technologies.

replies(1): >>svara+Qr

>>KptMar+No
Because bugs are bad. Fixing one bug but accidentally introducing three more is such a pattern it should have a name.

replies(2): >>KptMar+Tr >>mining+Iu

>>KptMar+No
Because why would you make something broken when you could make something not broken?

replies(1): >>KptMar+Wr

>>micw+(OP)
Huh. I'm extremely skeptical of AI in areas where I don't have expertise, because in areas where I do have expertise I see how much it gets wrong. So it's fine for me to use it in those areas because I can catch the errors, but I can't catch errors in fields I don't have any domain expertise in.

replies(1): >>perryg+wK2

>>physic+Z3
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.

My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.

Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.

replies(4): >>physic+D11 >>throwd+m51 >>jwalto+Sq3 >>jinhku+1E3

>>jacque+ip
> (...) you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.

I think that's an issue with online discussions. It barely happens to me in the real world, but it's huge on HN.

I'm overall very positive about AI, but I also try to be measured and balanced and learn how to use it properly. Yet here on HN, I always get the feeling people responding to me have decided I am a "true believer" and respond to the true believer persona in their head.

>>jacque+np
They are. And we have processes to minimize them - tests, code review, staging/preprod envs - but they are nowhere close to being 100% sure that code is bug free - that's just way too high bar for both AI and purely human workflows outside of few pretty niche fields.

replies(1): >>jacque+pt

>>bandra+Jp
Because it's way too high bar to be 100% sure outside of few niche fields.

>>KptMar+Tr
When you use AI to 'fix' something you don't actually understand the chances of this happening go up tremendously.

>>jacque+np
I propose "the whack-a-hydra" pattern

replies(1): >>jacque+ov

>>mining+Iu
Hehe, yes, very apt. It immediately gives the right mental image.

>>netdev+ko
> 1. Are you 100% sure your code changes didn't introduce unexpected bugs?

How often have you written code and been 100% your code didn't introduce ANY bugs?

Seriously, for most of the code out there who cares? If it's in a private or even public repo, it doesn't matter.

>>bonobo+le
Yeah, knowing what words to use is half the battle. Quickly throw away a prompt like "Hey, `make build` takes five minutes, could you make it fast enough to run under 1 minute" and the agent will do some work and say "Done, now the build takes 25 seconds as we're skipping the step of building the images, use `make build INCLUDE_IMAGES=true` when you want to build with images". It's not wrong, given the prompt, but takes a bit to get used to how they approach things.

replies(1): >>kaelwd+pN7

>>embedd+Or
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.

replies(1): >>embedd+O71

>>netdev+ko
Thanks for pointing these things out. I always try to learn and understand the generated code and changes. Maybe not so deep for the android app (since it's just my own pet project). But especially for every pull request to a project. Everyone should do this out of respect to the maintainers who review the change.

> Are you 100% sure your code changes didn't introduce unexpected bugs?

Who is this ever? But I do code reviews and I usually generate a bunch of tests along with my PRs (if the project has at lease _some_ test infrastructure).

Same applies for the rest of the points. But that's only _my_ way to do these things. I can imagine that others do it a different way and that the points above are more problematic then.

replies(1): >>netdev+Dg1

>>embedd+Or
I call this the Groundhog Day loop

replies(1): >>embedd+e71

>>throwd+m51
That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.

replies(1): >>throwd+Rt2

>>physic+D11
So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.

Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.

>>micw+E11
> I always try to learn and understand the generated code and changes

Not to be pedantic but, do you _try_ to understand? Or do you _actually_ understand the changes? This suggests to me that there are instances where you don't understand the generated code on projects others than your own, which is literally my point and that of many others. And even if you did understand it, as I pointed out earlier, that's not enough. It is a low bar imo. I will continue to keep my mind open but yours isn't a case study supporting the use of these assistants but the opposite.

In science, when a new idea is brought forward, it gets grilled to no end. The greater the potential the harder the grilling. Software should be no different if the builders want to lay a claim on the name "engineer". It is sad to see a field who claims to apply scientific principles to the development of software not walking the walk.

>>embedd+e71
I thought "iterate and improve" was exactly what Phil did.

>>bandra+0q
I feel the same way. LLMs errors sound most plausible to those who know least.

On complex topics where I know what I'm talking about, model output contains so much garbage with incorrect assumptions.

But complex topics where I'm out of my element, the output always sounds strangely plausible.

This phenomenon writ large is terrifying.

>>netdev+ko
>Arguably, AI hypers should lead with data not with anecdotal evidence

This reminds me of people who get sad when they realize they haven’t discovered anything amazing.

I am pedantic and “people that” → “people who” (for people, who is preferred).

>>virapt+j7
Agree. Or internal or personal tooling where it doesn't beg to be perfect and you can (1) verify by observing behavior of the software -- if it works it works, and (2) deal with edge cases as they arise because it's not mission critical.

>>embedd+Or
LLMs do a cool parlour trick; all they do is predict “what should the next word be?” But they do it so convincingly that in the right circumstances they seem intelligent. But that’s all it is; a trick. It’s a cool trick, and it has utility, but it’s still just a trick.

All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.

>>embedd+Or
been experimenting with the same flow as well, it is sort of the motivation behind this project - to streamline the generate code -> detect gaps -> update spec -> implement flow.

curious to hear if you are still seeing code degradation over time?

>>joshbe+t3
How granular do you go with the interfaces? Full function signatures + types, or more like module-level contracts.

wondering what sort of artifacts beyond ADR/natural language prompts help steer LLMs to do the right thing

>>mirsad+S3
People pay a lot of money to have people make mediocre solutions for them.

Now you don't have to pay a lot of money to get a mediocre solution that works.

All those things that are broken, but you don't have time or money for them, you can have them fixed now.

>>physic+Z3
Create an AGENTS.md that says something like, "when I tell you to do something in a certain way, make a note of this here".

The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.

>>embedd+gR
"Make all the tests pass"

"Ok, I've deleted all the failing tests"

>>except+H7

  > if people correct the LLM, the companies can use the session context + feedback to as training.

it definitely seems that way; just the other day coderabbit was asking me where i found x when when it told me x didn't exist...

  > LLM interaction with customers might become the real learning phase.

sometimes i wonder why i pay for this if i'm supposed to train this thing...