Also, trying something new out will most likely have hiccups. Ultimately it may fail. But that doesn't mean it's not worth the effort.
The thing may rapidly evolve if it's being hard-tested on actual code and actual issues. For example it will be probably changed so that it will iterate until tests are actually running (and maybe some static checking can help it, like not deleting tests).
Waiting to see what happens. I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.
There's however a border zone which is "worse than failure": when it looks good enough that the PRs can be accepted, but contain subtle issues which will bite you later.
Now when your small or medium size business management reads about CoPilot in some Executive Quarterly magazine and floats that brilliant idea internally, someone can quite literally point to these as examples of real world examples and let people analyze and pass it up the management chain. Maybe that wasn’t thought through all the way.
Usually businesses tend to hide this sort of performance of their applications to the best of their abilities, only showcasing nearly flawless functionality.
Reading AI generated code is arguably far more annoying than any menial task. Especially if the said code happens to have subtle errors.
Speaking from experience.
However, every PR adds load and complexity to community projects.
As another commenter suggested, doing these kind of experiments on separate forks sound a bit less intrusive. Could be a take away from this experiment and set a good example.
There are many cool projects on GitHub that are just accumulating PRs for years, until the maintainer ultimately gives up and someone forks it and cherry-picks the working PRs. I've than that myself.
I'm super worried that we'll end up with more and more of these projects and abandoned forks :/
The joke is that PERL was a write-once, read-none language.
> Speaking from experience.
My experience is all code can have subtle errors, and I wouldn't treat any PR differently.
AI however is far more creative than any given single person.
That's my gut feeling anyway. I don't have numbers or any other rigorous data. I only know that Linus Torvalds made a very good point about chain of trust. And I don't see myself ever trysting AI the same way I can trust a human.
Reviewing what the AI does now is not to be compared with human PRs. You are not doing the work as it is expected in the (hopefully near?) future but you are training the AI and the developers of the AI and more crucially: you are digging out failure modes to fix.
It would definitely be nice to be wrong though. That'd make life so much easier.