"8 unit tests? Great, I'll code up 8 branches so all your tests pass!" Of course that neglects the fact that there's now actually 2^8 paths through your code.
Perhaps more advanced llms + specifications + better tests.
The reality of things is, AI still can't handle long running tasks without blowing $500k worth of tokens for an end result that doesn't work, and further work is another $100k worth to get nothing novel.