zlacker

[return to "Watching AI drive Microsoft employees insane"]
1. rsynno+34[view] [source] 2025-05-21 11:44:25
>>laiysb+(OP)
Beyond every other absurdity here, well, maybe Microsoft is different, but I would never assign a PR that was _failing CI_ to somebody. That that's happening feels like an admission that the thing doesn't _really_ work at all; if it worked even slightly, it would at least only assign passing PRs, but presumably it's bad enough that if they put in that requirement there would be no PRs.
◧◩
2. sbarre+La[view] [source] 2025-05-21 12:36:01
>>rsynno+34
I feel like everyone is applying a worse-case narrative to what's going on here..

I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

This is a test. You can't improve a system without testing it on real world conditions.

How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?

Can no one see the possibility that what is happening in those PRs is exactly what all the people involved expected to have happen, and they're just going through the process of seeing what happens when you try to refine and coach the system to either success or failure?

When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).

We asked a bunch of senior engineers to see how far they could get by coaching the AI to write code rather than writing it themselves. We wanted to calibrate our expectations and better understand the limits, strengths and weaknesses of these new tools we wanted to adopt.

In most of those early cases we ended up with worse code than if it had been written by humans, but we learned a ton. We can also clearly see how much better things have gotten over time, since we have that benchmark to look back on.

◧◩◪
3. phkahl+dn[view] [source] 2025-05-21 14:05:06
>>sbarre+La
>> I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

>> This is a test. You can't improve a system without testing it on real world conditions.

Software developers know to fix build problems before asking for a review. The AIs are submitting PRs in bad faith because they don't know any better. Compilers and other build tools produce errors when they fail, and the AI is ignoring this first line of feedback.

It is not a maintainers job to review code for syntax errors, or use of APIs that don't actually exist, or other silly mistakes. That's the compilers job and it does it well. The AI needs to take that feedback and fix the issues before escalating to humans.

◧◩◪◨
4. sbarre+vn[view] [source] 2025-05-21 14:06:53
>>phkahl+dn
Like I said, I think you may be missing the point of the whole exercise.
◧◩◪◨⬒
5. agos+is5[view] [source] 2025-05-23 08:50:43
>>sbarre+vn
then the output of the exercise can only be "AI is ignoring errors"
[go to top]