> This seems like it's fixing the symptom rather than the underlying issue?
This is also my experience when you haven't setup a proper system prompt to address this for everything an LLM does. Funniest PRs are the ones that "resolves" test failures by removing/commenting out the test cases, or change the assertions. Googles and Microsofts models seems more likely to do this than OpenAIs and Anthropics models, I wonder if there is some difference in their internal processes that are leaking through here?
The same PR as the quote above continues with 3 more messages before the human seemingly gives up:
> please take a look
> Your new tests aren't being run because the new file wasn't added to the csproj
> Your added tests are failing.
I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.
Another PR: https://github.com/dotnet/runtime/pull/115732/files
How are people reviewing that? 90% of the page height is taken up by "Check failure", can hardly see the code/diff at all. And as a cherry on top, the unit test has a comment that say "Test expressions mentioned in the issue". This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.
At what point does the human developers just give up and close the PRs as "AI garbage". Keep the ones that works, then just junk the rest. I feel that at some point entertaining the machine becomes unbearable and people just stops doing it or rage close the PRs.
Microsoft's stock price is dependent on them proving that this is a success.
it's not as if Microsoft's share price has ever reflected the quality of their products
At one point, their desktop user experience was actually pretty good. And that was all their products back then. They definitely didn't get to where they are now by selling products that were bad. You could make the argument that some of them were bad but they were cheap, but if price is a big aspect of what makes a product good in the eyes of the consumer at the time and nobody else is competing on price, then that isn't "bad" in the sense I'm using the word.
I don't think I'd have called them out for always making terrible products all the way through till about Windows 7. I had no major complaints about that release, cloud was in its infancy, no pushing 365 etc. After that, quality started to go downhill. To the point that I'd argue with a straight face that most major community supported Linux DEs provide an objectively better and more stable user experience for both technical and non technical users.