zlacker

I'm also suspecting this has a lot to do with the dichotomy between the "omg llms are amazing at code tasks" and "wtf are these people using these llms for it's trash" takes.

As someone who works primarily within the Laravel stack, in PHP, the LLM's are wildly effective. That's not to say there aren't warts - but my productivity has skyrocketed.

But it's become clear that when you venture into the weeds of things that aren't very mainstream you're going to get wildly more hallucinations and solutions that are puzzling.

Another observation is that I believe that when you start getting outside of your expertise you're likely going to have a correlating amount of 'waste' time spent where the LLM is spitting out solutions that an expert in the domain would immediately recognize as problematic but the non-expert will see and likely reason that it seems reasonable/or, worse, not even look at the solution and just try to use it.

100% of the time that I've tried to get Claude/Gemini/ChatGPT to "one shot" a whole feature or refactor it's been a waste of time and tokens. But when I've spent even a minimal amount of energy to focus it in on the task, curate the context and then approach? Tremendously effective most times. But this also requires me to do enough mental work that I probably have an idea of how it should work out which primes my capability to parse the proposed solutions/code and pick up the pieces. Another good flow is to just prompt the LLM (in this case, Claude Code, or something with MCP/filesystem access) with the feature/refactor/request asking it to draw up the initial plan of implementation to feed to itself. Then iterate on that as needed before starting up a new session/context with that plan and hitting it one item at a time, while keeping a running {TASK_NAME}_WORKBOOK.md (that you task the llm to keep up to date with the relevant details) and starting a new session/context for each task/item on the plan, using the workbook to get the new sessions up to speed.

Also, this is just a hunch, but I'm generally a nocturnal creature and tend to be working in the evening into early mornings. Once 8am PST rolls around I really feel like Claude (in particular) just turns into mush. Responses get slower but it seems it loses context where it otherwise wouldn't start getting off topic/having to re-read files it should already have in context. (Note; I'm pretty diligent about refreshing/working with the context and something happens in the 'work' hours to make it terrible)

I'd imagine we're going to end up with language specific llms (though I have no idea, just seems logical to me) that a 'main' model pushes tasks/tool usage to. We don't need our "coding" LLM's to also be proficient on oceanic tidal patterns and 1800's boxing history. Those are all parameters that could have been better spent on the code.