Scaling long-running autonomous coding

>>srames+(OP)
> I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising

> When I made my 2029 prediction this is more-or-less the quality of result I had in mind.

There seems to be a lot of compensation and leniency made by the author here.

So, it is seemingly impressive that someone was able to use agents to build a browser.

But they used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this?

The browser itself is not fully complete. There's rendering glitches stated in the article. So millions of dollars for something that has obvious bugs.

This is also pure agent code. Can a code base like this ever be maintained by a team of humans? Are you vendor locked into a specific model if you want to build more features? How will support work? How will releases work? The lack of reflection over the rest of the software lifecycle except building is shocking.

So I'm not sure after reflecting, whether any of this is impressive outside of "someone with unlimited tokens built a browser using ai agents". It's the same class of problem being solved over and over again. Nothing new is really being done here.

Maybe it's just me but there's much more to software than just building.

>>tabs_o+xU3
If an AI system autonomously built a rocket and went to the moon, would you call it unimpressive because it's already been done? The moving of goalposts is shocking.

>>lateri+Yb4
As I explained elsewhere in this thread, the results here are more like trying to launch a rocket to the moon, unleashing AI on the problem, and settling for some kind of giant firecracker as a POC.

This isn't a POC web engine; it's throw-away code that can never scale to a full web engine.

So instead of wasting millions on this autonomous run, they should have put together a small team of people with some ideas on how to improve on existing web engines, and then give that team a large token development budget. You could get a nice POC after a couple of weeks, and after a year or two of further iterations you might have something really interesting.

So this is a great example of how AI fails when left unsupervised; a more interesting experiment would be about how a small team can leverage AI to leapfrog Chromium; not in one week but in a year or two.

zlacker