I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
Everything old is new again!
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
The only people who believe this level of AI marketing are the people who haven't yet used the tools.
> which means no one will be watching the cost in real time.
Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.
Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.
What do you mean?
If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
You still get your 24 hours, no matter how much time you save.
What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
You could also say you saved 41.666 people an entire 24 hour day, by "saving 1000 hours", or some other fractional way.
How you're trying to explain it as "saving 1000 hours each day" is really not making any sense without further context.
And I'm sure if I hadn't written this comment I would be saving 1000 hours on a stupid comment thread.
That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.
It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.
Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.
This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.
Engineers will essentially become people who just guide the AIs and verify tests.
For example “this module contains logic defining routes for serving an HTTP API. We don’t write any logic that interacts directly with db models in these modules. Rather, these modules make calls to services in `/services`, which make such calls.”
It wouldn’t make sense to duplicate this comment across every router sub-module. And it’s not obvious from looking at any one module that this rule is applied across all modules, without such guidance.
These little bits of scaffolding really help narrow down the scope of the code that LLMs eventually try to write.
Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.
I've intentionally picked up simple stack: html+js+php.
A couple of things:
* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)
On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.
It's like this coupon booklets they used to sell. "Over $10,000 of savings!"
Yes but how much money do I have to spend in order to save $10,000?
There was this funny commercial in the 90s for some muffler repair chain that was having a promotion: "Save Fifty Dollars..."
The theme was "What will you do with the fifty dollars you saved?" And it was people going to Disneyland or afancy dinner date.
The people (actors) believed they were receiving $50. They acted as if it was free money. Meanwhile there was zero talk about whether their cars needed muffler repair at all.
But the llm bill will always invoice you for all the saved work regardless.
It's called "Thinking past the sale". It's a common sales tactic.
[1] https://notes.jessmart.in/My+Writings/Pair+Programming+with+...
READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.
My experience agrees that separating the README and the TODO is super helpful for managing context.
Then as the context window increases, it’s less and less of an issue
The more of my washing you can take off me, the more of your time you can save by then using a washing machine or laundry service!
Saving an hour of my time is a waste, when saving an hour of your time is worth so much more. So it makes economic sense for you to pay me, to take my washing off me!
( Does that better illustrate my point? )
If this product is going to be successful they are going to need the bulk of their customers at 40-100k employees.