Scaling long-running autonomous coding

>>srames+(OP)
> I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising

> When I made my 2029 prediction this is more-or-less the quality of result I had in mind.

There seems to be a lot of compensation and leniency made by the author here.

So, it is seemingly impressive that someone was able to use agents to build a browser.

But they used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this?

The browser itself is not fully complete. There's rendering glitches stated in the article. So millions of dollars for something that has obvious bugs.

This is also pure agent code. Can a code base like this ever be maintained by a team of humans? Are you vendor locked into a specific model if you want to build more features? How will support work? How will releases work? The lack of reflection over the rest of the software lifecycle except building is shocking.

So I'm not sure after reflecting, whether any of this is impressive outside of "someone with unlimited tokens built a browser using ai agents". It's the same class of problem being solved over and over again. Nothing new is really being done here.

Maybe it's just me but there's much more to software than just building.

>>tabs_o+xU3
>But they used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this?

Yes, arguably 5 million is a fair price and cheaper than what it would take to pay humans.

>>simian+qo4
There is a problem with this comparison. The agent had access to open-source browsers in its training set. So you'd need to compare the cost of creating an equivalent browser for a developer who has access to them, too. If all you need is standard browser functionality, you just use an existing browser. If you want to change some features or parts of the implementation, you fork it. A new browser written from scratch would be valuable if it had a novel implementation that resulted in a faster/more secure/robust/memory efficient or simply easier-to-use browser. So even if this had implemented the standard correctly, it wouldn't be worth more than the time it takes a developer to fork Chromium and change its name. Don't get me wrong, it's impressive, but not as impressive after you think that an LLM that regurgitates verbatim the code of Chromium when tasked to build a browser would have effectively succeeded at the task.

EDIT: About the rendering speed. It doesn't really make sense to compare it with a fully functioning browser, as you could potentially drop features or make bogus optimisations to go faster.

zlacker