Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.
But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.
And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!
Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.
The talk about it makes more sense when you remember most developers are primarily writing CRUD webapps or adware, which is essentially a solved problem already.
It's a moderately useful tool for me. I suspect the people that get the most use out of are those that would take more than 1 hour to read code I would take 10 minutes to read. Which is to say the least experienced people get the most value.
I guess I could work on the magic incantations to tweak here and there a bit until it works and I guess that's the way it is done. But I wasn't hooked.
I do get value out of LLM's for isolated broken down subtasks, where asking a LLM is quicker than googling.
For me, AI will probably become really usefull, once I can scan and integrate my own complex codebase so it gives me solutions that work there and not hallucinate API points or jump between incompatible libary versions (my main issue).
Also you have to learn to talk to it and how to ask it things.
I typically use it to whip up a CLI tool or script to do something that would have been too fiddly otherwise.
While sitting in a Teams meeting I got it to use the Roslyn compiler SDK in a CLI tool that stripped a very repetitive pattern from a code base. Some OCD person had repeated the same nonsense many thousands of times. The tool cleaned up the mess in seconds.
Almost everybody doing serious work with LLMs is using an agent, which means that the LLM is authoring files, linting them, compiling them, and iterating when it spots problems.
There's more to using LLMs well than this, but this is the high-order bit.
later
Oh, I like Zed a lot too. People complain that Zed's agent (the back-and-forth with the model) is noticeably slower than the other agents, but to me, it doesn't matter: all the agents are slow enough that I can't sit there and wait for them to finish, and Zed has nice desktop notifications for when the agent finishes.
Plus you get a pretty nice editor --- I still write exclusively in Emacs, but I think of Zed as being a particularly nice code UI for an LLM agent.
For code changes I prefer to paste a single function in, or a small file, or error output from a compile failure. It’s pretty good at helping you narrow things down.
So, for me, it’s a pile of small gains where the value is—because ultimately I know what I generally want to get done and it helps me get there.
I have this workflow where I trigger a bunch of prompts in the morning, lunch and at the end of the day. At those same times I give it feedback. The async nature really means I can have it work on things I can’t be bothered with myself.
If you aren't building up mental models of the problem as you go, you end up in a situation where the LLM gets stuck at the edges of its capability, and you have no idea how even to help it overcome the hurdle. Then you spend hours backtracking through what it's done building up the mental model you need, before you can move on. The process is slower and more frustrating than not using AI in the first place.
I guess the reality is, your luck with AI-assisted coding really comes down to the problem you're working on, and how much of it is prior art the LLM has seen in training.
I've successfully been able to test out new libraries and do explorations quickly with AI coding tools and I can then take those working examples and fix them up manually to bring them up to my coding standards. I can also extend the lifespan of coding tools by doing cleanup cycles where I manually clean up the code since they work better with cleaner encapsulation, and you can use them to work on one scoped component at a time.
I've found that they're great to test out ideas and learn more quickly, but my goal is to better understand the technologies I'm prototyping myself, I'm not trying to get it to output production quality code.
I do think there's a future where LLMs can operate in a well architected production codebase with proper type safe compilation, linting, testing, encapsulation, code review, etc, but with a very tight leash because without oversight and quality control and correction it'll quickly degrade your codebase.
If it helps, for context: I'll go round and round with an agent until I've got roughly what I want, and then I go through and beat everything into my own idiom. I don't push code I don't understand and most of the code gets moved or reworked a bit. I don't expect good structure from LLMs (but I also don't invest the time to improve structure until I've done a bunch of edit/compile/test cycles).
I think of LLMs mostly as a way of unsticking and overcoming inertia (and writing tests). "Writing code", once I'm in flow, has always been pleasant and fast; the LLMs just get me to that state much faster.
I'm sure training data matters, but I think static typing and language tooling matters much more. By way of example: I routinely use LLMs to extend intensely domain-specific code internal to our project.
I find that token memory size limits are the main barrier here.
Once the LLM starts forgetting other parts of the application, all bets are off and it will hallucinate the dumbest shit, or even just remove features wholesale.
I haven't seen any mentions of Augment code yet in comment threads on HN. Does anyone else use Augment Code?
It keeps _me_ from context switching into agent manager mode. I do the same thing for doing code reviews for human teammates as well.
"The programming language of the future will be English"
---
"Well are you using it right? You have to know how to use it"
What were they doing?
Like most of the code agents it works best with tight testable loops. But it has a concept of short vs long tests and will give you plans as nd confidence values to help you refine your prompt if you want.
I tend to just let it go. If it gets to a 75% done spot that isn’t worth more back and forth I grab the pr and finish it off.
It has a very good system prompt so the code is pretty good without a lot of fluff.
Clearly something like “server telemetry” is the datacenter’s “CRUD app” analogue.
It’s a solved problem that largely requires rtfm and rote execution of well worn patterns in code structure.
Please stick to the comment guidelines:
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.
For my personal project in zig they either get lost completely or gives me terrible code (my code isn't _that_ bad!). There seems to be no middle ground here. I've even tried the tools as pair programmers but they often get lost or stuck in loops of repeating the same thing that's already been mentioned (likely falls out of the context window).
When it comes to others using such tools I've had to ask them to stop using it to think as it becomes next to impossible to teach / mentor if they're passing that I say to the LLM or trying to have it perform the work. I'm confident in debugging people when it comes to math / programming but with an LLM between it's just not possible to guess where they went wrong or how to bring them back to the right path as the throught process is lost (or there wasn't one to begin with).
This is not even "vibe coding", I've just never found it generally useful enough to use day-to-day for any task and my primary use of say phind has been to use it as an alternative to qwant when I cannot game the search query well enough to get the search results I'm looking for (i.e I ignore the LLM output and just look at the references).
The one thing I really appreciated though was the AI’s ability to do a “fuzzy” search in occasional moments of need. Or, for example, sometimes the colloquial term for a feature didn’t match naming conventions in source code. The AI could find associations in commit messages and review information to save me time rummaging through git-blame. Like I said though, that sort of problem wasn’t necessarily a bottleneck and could often be solved much more cheaply by asking around coworker on Slack.
That does nothing except add visual noise.
It's like a magic incantation to make the errors go away (it doesn't actually), probably by someone used to Visual Basic's "ON ERROR RESUME NEXT" or some such.
That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on.
We get this issue even with obscure FLOSS libraries.
When we fail to provide context to LLMs, they generate examples by following supperficial queues like coding conventions. In extreme cases, such as code that employs source code generators or templates, LLMs even fill in function bodies that code generators are designed to generate for you. That's because, if LLMs are oblivious to the context, they resort to hallucinate their way into something seemingly coherent. Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.
What's truly impressive about this is that often times the hallucinated code actually works.
> Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.
Again,this suggest a failure on your side for not providing any context.
If you give it enough context LLMs synthesize and present them almost instantly. If you're prompting a LLM to generate documentation, which boils down to synthesizing what an implementation does and what's their purpose,and the LLM comes up empty, that means you failed to give it anything to work on.
The bulk of your comment screams failure to provide any context. If your code steers far away from what it expects, fails to follow any discernible structure, and doesn't even convey purpose and meaning in little things like naming conventions, you're not giving the LLM anything to work on.
I guess my point is, I have no use for LLMs in their current state.
> That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on. > We get this issue even with obscure FLOSS libraries.
This is the issue however as unfamiliar codebases is exactly where I'd want to use such tooling. Not working in those cases makes it less than useful.
> Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.
In all cases context was provided extensively but at some point it's easier to just write the code directly. The context is in surrounding code which if the tool cannot pick up on that when combined with direction is again less than useful.
> What's truly impressive about this is that often times the hallucinated code actually works.
I haven't experienced the same. It fails more often than not and the result is much worse than the hand-written solution regardless of the level of direction. This may be due to unfamiliar code but again, if code is common then I'm likely familiar with it already thus lowering the value of the tool.
> Again,this suggest a failure on your side for not providing any context.
This feels like a case of blaming the user without full context of the situation. There are comments, the names are descriptive and within reason, and there's annotation of why certain things are done the way they are. The purpose of a doc comment is not "this does X" but rather _why_ you want to use this function and it's purpose which is something LLMs struggle to derive from my testing of them. Adding enough direction to describe such is effectively writing the documentation with a crude english->english compiler between. This is the same problem with unit test generation where unit tests are not to game code coverage but to provide meaningful tests of the domain and known edge cases of a function which is again something the LLM struggles with.
For any non-junior task LLM tools are practically useless (from what I've tested) and for junior level tasks it would be better to train someone to do better.
With a web-based system you need repomix or something similar to give the whole project (or parts of it if you can be bothered to filter) as context, which isn't exactly nifty
Cursor is fine, Claude Code and Aider are a bit too janky for me - and tend to go overboard (making full-ass git commits without prompting) and I can't be arsed to rein them in.
I could spend 5-10 minutes digging on through the docs for the correct config option, or I can just tap a hotkey, open up GitHub Copilot in Rider and tell it what I want to achieve.
And within seconds it had a correct-looking setting ready to insert to my renovate.json file. I added it, tested it and it works.
I kinda think people who diss AIs are prompting something like "build me Facebook" and then being disappointed when it doesn't :D
Inconsistency and crap code quality aren't solved yet, and these make the agent workflow worse because the human only gets to nudge the AI in the right direction very late in the game. The alternative, interactive, non-agentic workflows allow for more AI-hand-holding early, and better code quality, IMO.
Agents are fine if no human is going to work on the (sub)system going forward, and you only care about the shiny exterior without opening the hood to witness the horrors within.
I have definitely not seen this in my experience (with Aider, Claude and Gemini). While helping me debug an issue, Gemini added a !/bin/sh line to the middle of the file (which appeared to break things), and despite having that code in the context didn't realise it was the issue.
OTOH, when asking for debugging advice in a chat window, I tend to get more useful answers, as opposed to a half-baked implementation that breaks other things. YMMV, as always.
Based from what I've seen, Python and TypeScript are where it fares best. Other languages are much more hit and miss.
For instance, dealing with files that don't quite work correctly between two 3D applications because of slightly different implementations. Ask for a python script to patch the files so that they work correctly – done almost instantly just by describing the problem.
Also for prototyping. Before you spend a month crafting a beautiful codebase, just get something standing up so you can evaluate whether it's worth spending time on – like, does the idea have legs?
90% of programming problems get solved with a rubber ducky – and this is another valuable area. Even if the AI isn't correct, often times just talking it through with an LLM will get you to see what the solution is.
They are very handy tools that can help you learn a foreign code/base faster. They can help you when you run into those annoying blockers that usually take hours or days or a second set of eyes to figure out. They give you a sounding board and help you ask questions and think about the code more.
Big IF here. IF you bother to read. The danger is some people just keep clicking and re-prompting until something works, but they have zero clue what it is and how it works. This is going to be the biggest problem with AI code editors. People just letting Jesus take the wheel and during this process, inefficient usage of the tools will lead to slower throughput and a higher bill. AI costs a good chunk of change per token and that's only going up.
I do think it's addictive for sure. I also think the "productivity boost" is a feeling people get, but no one measures. I mean, it's hard to measure. Then again, if you do spend an hour on a problem you get stuck on vs 3 days then sure it helped productivity. In that particular scenario. Averaged out? Who knows.
They are useful tools, they are just also very misunderstood and many people are too lazy to take the time to understand them. They read headlines and unsubstantiated claims and get overwhelmed by hype and FOMO. So here we are. Another tech bubble. A super bubble really. It's not that the tools won't be with us for a long time or that they aren't useful. It's that they are way way overvalued right now.
Regardless, Gemini 2.5 Pro is far far better and I use that with open-source free Roo Code. You can use the Gemini 2.5 Pro experimental model for free (rate limited) to get a completely free experience and taste for it.
Cursor was great and started is off, but others took notice and now they're all more or less the same. It comes down to UX and preference, but I think Windsurf and Roo Code just did a better job here than Cursor, personally.
I challenge you to explore different perspectives.
You are faced with a service that handles any codebase that's thrown at it with incredible ease, without requiring any tweaking or special prompting.
For some reason, the same system fails to handle your personal codebase.
What's the root cause? Does it lie in the system that works everywhere with anything you throw at it? Or is it in your codebase?
Perhaps it’s time for a career change then. Follow your joy and it will come more naturally for you to want to spread it.
Again,
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
From my reading the “strongest possible interpretation” of the original “CRUD app” line was “it’s a solved problem that largely requires rtfm and rote execution of well worn patterns in code structure” making it similarly situated as “server telemetry” to make llms appear superintelligent to people new to programming within those paradigms.
I’m unfamiliar with “device mapping”, so perhaps someone else can confirm if it is “the crud app of Linux kernel dev” in that vein.
Just listing topics in software development is hardly evidence of either your own ability to work on them, or of their inherent complexity.
Since this seems to have hurt your feelings, perhaps a more effective way to communicate your needs would be to explain why you find “server telemetry” to be more difficult/complex/w/e to warrant needing an llm for you to be able to do it.
Note that language servers, static analysis tooling, and so on still work without issue.
The cause (which is my assumption) is that there aren't enough good examples in the training set for anything useful to be the most likely continuation thus leading to a suboptimal result given the domain. Thus the tool doesn't work "everywhere" for cases where there's less use of a language or less code in general dealing with a particular problem.