I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.
Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.
It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.
But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.
I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.
What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.
Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.
What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.
[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...
Busy code I need to generate is difficult to do with AI too. Because then you need to formalize the necessary context for an AI assistant, which is exhausting with an unsure result. So perhaps it is just simpler to write it yourself quickly.
I understand comments being negative, because there is so much AI hype without having to many practical applications yet. Or at least good practical applications. Some of that hype is justified, some of it is not. I enjoyed the image/video/audio synthesis hype more tbh.
Test cases are quite helpful and comments are decent too. But often prompting is more complex than programming something. And you can never be sure if any answer is usable.
Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.
It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.
If you know what you're doing you can still "teach" them though, but it's on you to do that - you need to keep on iterating on things like the system prompt you are using and the context you feed in to the model.
If an LLM learned something when you gave it commands, it would probably be reflected in some adjusted weights in some of its operational matrix. This is true of human learning, we strengthen some neural connection, and when we receive a similar stimuli in a similar situation sometime in the future, the new stimuli will follow a slightly different path along its neural pathway and result in a altered behavior (or at least have a greater probability of an altered behavior). For an LLM to “learn” I would like to see something similar.
Admittedly, you have to wrap LLMs to with stuff to get them to do that. If you want to rewrite the rules to excluded that then I will have to revise my statement that it is "mostly, but not completely true".
:-P
I think SSR schedulers are a good example of a Machine Learning algorithms that learns from it’s previous interactions. If you run the optimizer you will end up with a different weight matrix, and flashcards will be schedule differently. It has learned how well you retain these cards. But an LLM that is simply following orders has not learned anything, unless you feed the previous interaction back into the system to alter future outcomes, regardless of whether it “remembers” the original interactions. With the SSR, your review history is completely forgotten about. You could delete it, but the weight matrix keeps the optimized weights. If you delete your chat history with ChatGPT, it will not behave any differently based on the previous interaction.