I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.
All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.
curious to hear if you are still seeing code degradation over time?
Now you don't have to pay a lot of money to get a mediocre solution that works.
All those things that are broken, but you don't have time or money for them, you can have them fixed now.
The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.
> if people correct the LLM, the companies can use the session context + feedback to as training.
it definitely seems that way; just the other day coderabbit was asking me where i found x when when it told me x didn't exist... > LLM interaction with customers might become the real learning phase.
sometimes i wonder why i pay for this if i'm supposed to train this thing...