Watching AI drive Microsoft employees insane

>>laiysb+(OP)
Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

> This seems like it's fixing the symptom rather than the underlying issue?

This is also my experience when you haven't setup a proper system prompt to address this for everything an LLM does. Funniest PRs are the ones that "resolves" test failures by removing/commenting out the test cases, or change the assertions. Googles and Microsofts models seems more likely to do this than OpenAIs and Anthropics models, I wonder if there is some difference in their internal processes that are leaking through here?

The same PR as the quote above continues with 3 more messages before the human seemingly gives up:

> please take a look

> Your new tests aren't being run because the new file wasn't added to the csproj

> Your added tests are failing.

I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Another PR: https://github.com/dotnet/runtime/pull/115732/files

How are people reviewing that? 90% of the page height is taken up by "Check failure", can hardly see the code/diff at all. And as a cherry on top, the unit test has a comment that say "Test expressions mentioned in the issue". This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.

>>diggan+L1
> I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

That comparison is awful. I work with quite a few Junior developers and they can be competent. Certainly don't make the silly mistakes that LLMs do, don't need nearly as much handholding, and tend to learn pretty quickly so I don't have to keep repeating myself.

LLMs are decent code assistants when used with care, and can do a lot of heavy lifting, they certainly speed me up when I have a clear picture of what I want to do, and they are good to bounce off ideas when I am planning for something. That said, I really don't see how it could meaningfully replace an intern however, much less an actual developer.

>>surgic+68
These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted. There's a PM who takes your task and gives it to a "developer" who potentially has never actually written a line of code, but maybe they've built a WordPress site by pointing and clicking in Elementor or something. After dozens of hours billed you will, in fact, get code where the new file wasn't added to the csproj or something like that, and when you point it out, they will bill another 20 hours, and send you a new copy of the project, where the test always fails. It's exactly like this.

Nice to see that Microsoft has automated that, failure will be cheaper now.

>>safety+la
This gives me flashbacks to when my big corporate former employer outsourced a bunch of work offshore.

An outsourced contractor was tasked with a very simple job as their first task - update a single dependency, which required just a bump of the version and no code changes - after three days of them seemingly struggling to even understand what they were asked to do, inability to clone the repo, failure to install the necessary tooling on their machine, they ended up getting fired from the project. Complete waste of money, and the time of those of us having to delegate and review this work.

>>dkdbej+gd
Makes me wonder if the pattern will continue to follow, and we start to find certain agents—maybe due to config, maybe due to the training codebase and the codebase they're pointed at—that will become the single one out of the group we can rely on.

Give instructions, get good code back. That's the dream, though I think the pieces that need to fall into place for particular cases will prevent reaching that top quality bar in the general case.

>>98code+HB
Yeah, this is what we used to call "hiring." People who think it can ever come with guarantees make incompetent and tiresome clients.

I can't wait for the first AI agent programmer to realize this and start turning down jobs working for garbage people...or exploiting them at scale for pennies each, in a labor version of the "salami slicing" scheme. I don't mean humans using AI to do this, which of course has been at scale for years. I mean the first agent to discover a job prioritization heuristic on its own which leads to the same result.

zlacker