When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.
In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.
The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.