zlacker

Scaling long-running autonomous coding

submitted by samwil+(OP) on 2026-01-14 22:18:04 | 290 points 197 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
2. simonw+35[view] [source] 2026-01-14 22:37:31
>>samwil+(OP)
"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."

I shared my LLM predictions last week, and one of them was that by 2029 "Someone will build a new browser using mainly AI-assisted coding and it won’t even be a surprise" https://simonwillison.net/2026/Jan/8/llm-predictions-for-202... and https://www.youtube.com/watch?v=lVDhQMiAbR8&t=3913s

This project from Cursor is the second attempt I've seen at this now! The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...

6. ZitchD+76[view] [source] 2026-01-14 22:41:54
>>samwil+(OP)
I used similar techniques to build tjs [1] - the worlds fastest and most accurate json schema validator, with magical TypeScript types. I learned a lot about autonomous programming. I found a similar "planner/delegate" pattern to work really well, with the use of git subtrees to fan out work [2].

I think any large piece of software with well established standards and test suites will be able to be quickly rewritten and optimized by coding agents.

[1] https://github.com/sberan/tjs

[2] /spawn-perf-agents claude command: https://github.com/sberan/tjs/blob/main/.claude/commands/spa...

14. jphela+ga[view] [source] 2026-01-14 22:59:28
>>samwil+(OP)
This looks like extremely brittle code to my eyes. Look at https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...

What is `FrameState::render_placeholder`?

``` pub fn render_placeholder(&self, frame_id: FrameId) -> Result<FrameBuffer, String> { let (width, height) = self.viewport_css; let len = (width as usize) .checked_mul(height as usize) .and_then(|px| px.checked_mul(4)) .ok_or_else(|| "viewport size overflow".to_string())?;

    if len > MAX_FRAME_BYTES {
      return Err(format!(
        "requested frame buffer too large: {width}x{height} => {len} bytes"
      ));
    }

    // Deterministic per-frame fill color to help catch cross-talk in tests/debugging.
    let id = frame_id.0;
    let url_hash = match self.navigation.as_ref() {
      Some(IframeNavigation::Url(url)) => Self::url_hash(url),
      Some(IframeNavigation::AboutBlank) => Self::url_hash("about:blank"),
      Some(IframeNavigation::Srcdoc { content_hash }) => {
        let folded = (*content_hash as u32) ^ ((*content_hash >> 32) as u32);
        Self::url_hash("about:srcdoc") ^ folded
      }
      None => 0,
    };
    let r = (id as u8) ^ (url_hash as u8);
    let g = ((id >> 8) as u8) ^ ((url_hash >> 8) as u8);
    let b = ((id >> 16) as u8) ^ ((url_hash >> 16) as u8);
    let a = 0xFF;

    let mut rgba8 = vec![0u8; len];
    for px in rgba8.chunks_exact_mut(4) {
      px[0] = r;
      px[1] = g;
      px[2] = b;
      px[3] = a;
    }

    Ok(FrameBuffer {
      width,
      height,
      rgba8,
    })
  }
} ```

What is it doing in these diffs?

https://github.com/wilsonzlin/fastrender/commit/f4a0974594e3...

I'd be really curious to see the amount of work/rework over time, and the token/time cost for each additional actual completed test case.

◧◩
19. blibbl+Ub[view] [source] [discussion] 2026-01-14 23:07:31
>>jphela+ga
this is certainly an interesting way to pull out an attribute from a tag: https://github.com/wilsonzlin/fastrender/blob/main/crates/fa...
◧◩
60. kracke+iz[view] [source] [discussion] 2026-01-15 01:33:09
>>sashan+u5
I'm interested in this too. I was expecting just a chromium reskin, but it does seem to be at least something more than that. >>46625189 claims it uses Taffy for CSS layout but the docs also claim "Taffy for flex/grid, native for tables/block/inline"
64. logica+cG[view] [source] 2026-01-15 02:17:21
>>samwil+(OP)
At the same time they were doing this, I also iterated on an AI-built web browser with around 2,000 lines of code. I was heavily in the loop for it, it didn't run autonomously. You can see the current version of the source code here:

https://taonexus.com/publicfiles/jan2026/172toy-browser.py.t... (turn the sound down, it's a bit loud if you interact with the built-in Tetris clone.)

You can run it after installing the packages, "pip install requests pillow urllib3 numpy simpleaudio"

I livestreamed the latest version here 2 weeks ago, it's a ten minute video:

https://www.youtube.com/watch?v=4xdIMmrLMLo&t=45s

I'm posting from that web browser. As an easter egg, mine has a cool Tetris clone (called Pentrix) based on pieces with 5 segments, the button for this is at the upper-right.

If you have any feature suggestions for what you want in a browser, please make them here:

https://pollunit.com/polls/ahysed74t8gaktvqno100g

◧◩◪◨
76. dang+YX[view] [source] [discussion] 2026-01-15 05:02:50
>>geeuni+cb
Please don't cross into personal attack on HN.

https://news.ycombinator.com/showhn.html

◧◩
85. jkelle+0b1[view] [source] [discussion] 2026-01-15 07:01:58
>>tehsau+yO
WGPU for render, winit for window, servo css engine, taffy for layout sounds eerily similar to our existing open source Rust browser blitz.

https://github.com/dioxuslabs/blitz

Maybe we ended up in the training data!

◧◩
117. keepam+cz1[view] [source] [discussion] 2026-01-15 10:21:13
>>simonw+35
That makes a lot of sense for massive-scale efforts like a browser, using coordinated agents to push toward a huge, well defined target with existing benchmarks and tests.

My angle has been a bit different: scaling autonomous coding for individual developers, and in a much simpler way. I love CLI agents, but I found myself wasting time babysitting terminals while waiting for turns to finish. At some point it clicked: what if I could just email them?

Email sounds backward, but that’s the feature. It’s universal, async, already collaborative. The agent sends me a focused update, I reply with guidance, and it keeps working on a server somewhere, or my laptop, while I’m not glued to my desk. There’s still a human in the loop, just without micromanagement.

It’s been surprisingly joyful and productive, and it feels closer to how real organizations already work. I’ve put together a small, usable tool around this and shared it here if anyone wants to try it or kick the tires: >>46629191

◧◩◪◨
120. underd+Dz1[view] [source] [discussion] 2026-01-15 10:24:47
>>embedd+bu1
Looks like it doesn't compile for at least one other guy (I myself haven't tried): https://github.com/wilsonzlin/fastrender/issues/98

Yeah, answers need to be given.

◧◩◪
123. LiamPo+2B1[view] [source] [discussion] 2026-01-15 10:37:58
>>qingch+SY
> "It's a compiler bug" is more of a joke than a real issue

It's a very real issue, people just seem to assume their code is wrong rather than the compiler. I've personally reported 12 GCC bugs over the last 2 years and there's 1239 open wrong-code bugs currently.

Here's an example of a simple one in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

◧◩
130. afishh+zN1[view] [source] [discussion] 2026-01-15 12:24:40
>>simonw+35
> The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...

I took a 5-minute look at the layout crate here and... it doesn't look great:

1. Line height calculation is suspicious, the structure of the implementation also suggests inline spans aren't handled remotely correctly

2. Uhm... where is the bidi? Directionality has far reaching implications on an inline layout engine's design. This is not it.

3. It doesn't even consider itself a real engine:

        // Estimate text width (rough approximation: 0.6 * font_size * char_count)
        // In a real implementation, this would use font metrics
        let char_count = text.chars().count() as f32;
        let avg_char_width = font_size * 0.5; // Approximate average character width
        let text_width = char_count * avg_char_width;
I won't even begin talking about how this particular aspect that it "approximates" also has far reaching implications on your design...

I could probably go on in perpetuity about the things wrong with this, even test it myself or something. But that's a waste of time I'm not undertaking.

Making a "browser" that renders a few particular web pages "correctly" is an order of magnitude easier than a browser that also actually cares about standards.

If this is how "A Browser for the modern age." looks then I want a time machine.

◧◩◪◨⬒
149. thesz+rz2[view] [source] [discussion] 2026-01-15 15:57:56
>>torgin+cc1
Skilled devs compress, not generate (expand).

https://www.youtube.com/watch?v=8kUQWuK1L4w

The "discoverer" of APL tried to express as many problems as he could with his notation. First he found that notation expands and after some more expansion he found that it began shrinking.

The same goes to Forth, which provides means for a Sequitur-compressed [1] representation of a program.

[1] https://en.wikipedia.org/wiki/Sequitur_algorithm

Myself, I always strive to delete some code or replace some code with shorter version. First, to better understand it, second, to return back and read less.

◧◩
150. cube00+4D2[view] [source] [discussion] 2026-01-15 16:09:13
>>simonw+35
On Jan 1 2026

> Given how badly my 2025 predictions aged I'm probably going to sit that one out! [1]

Seven days later you appear on the same podcast you appeared on in 2025 to share your LLM predictions for 2026.

What changed?

[1]: >>46450269

◧◩◪◨⬒⬓⬔
158. micimi+mS2[view] [source] [discussion] 2026-01-15 17:03:27
>>ianbut+3v
Easiest to have different agents or turns that set aside the top-level goal via hooks/skills/manual prompt/etc. Heuristically, a human will likely ignore a lot of warnings until they've wired up the core logic, then go back and re-evaluate, but we still have to apply steering to get that kind of higher-order cognitive pattern.

Product is still fairly beta, but in Sculptor[^1] we have an MCP that provides agent & human with suggestions along the lines of "the agent didn't actually integrate the new module" or "the agent didn't actually run the tests after writing them." It leads to some interesting observations & challenges - the agents still really like ignoring tool calls compared to human messages b/c they "know better" (and sometimes they do).

[^]: https://imbue.com/sculptor/

166. thesur+Yk4[view] [source] 2026-01-15 23:56:36
>>samwil+(OP)
Pretty cool and related to another path of work I'm following from Steve Yegge: https://medium.com/@steve-yegge/welcome-to-gas-town-4f25ee16...
◧◩◪◨
173. logica+yk6[view] [source] [discussion] 2026-01-16 17:13:34
>>PaulHo+eU2
Thank you for the detailed feedback, though we would prefer for you to comment on the announcement threads where you see it. We really appreciate the feedback.

You're referring to State of Utopia's[1] web browser, currently available here:

https://taonexus.com/publicfiles/jan2026/172toy-browser.py.t... (turn the volume down if you play the included easter egg mini-game as it's very loud.)

10-minute livestream demonstration:

https://www.youtube.com/watch?v=4xdIMmrLMLo&t=45s

That livestream demonstration is side-by-side with Chrome, rendering very simple pages.

It compiles, renders simple web pages and is able to post.

The differences between cursor's browser and our browser:

    - Cursor's long-running autonomously coded browser: over a million lines of code and a trillion tokens, which is computationally intensive and has a high cost.
    - State of Utopia's browser: under 3000 lines of code.

    - Cursor's browser: does not compile at present.  There's no way to use it.
    - State of Utopia's browser: compiles in every version.  You can use it right away, and it includes a fun easter-egg game.

    - Cursor's browser: can't make form submissions
    - State of Utopia's browser: can make form submissions.
I'm submitting this using that browser. (I don't know if it will really post or not.)

We are taking feature requests!! Submit your requested feature here:

https://pollunit.com/polls/ahysed74t8gaktvqno100g

We are happy to put any feature you want into the web browser.

[1] will be available at https://stateofutopia.com or https://stofut.com for short (St. of Ut.)

◧◩◪◨
174. troupo+Qv6[view] [source] [discussion] 2026-01-16 18:02:33
>>simonw+Y43
> The fact that Firefox and Chrome and WebKit are likely buried in the training data somewhere might help them a bit, but it still looks to me more like an independent implementation that's influenced by those and many other sources.

They generate a statistically appropriate token based on a very small context window. And they are slightly nerfed not to reproduce everything verbatim because that would bring all sorts of lawsuits.

Of course they are not reproducing Webkit or Blink or Firefox verbatim. However, it's not an "independent implementation". That's why it's "stringing together a bunch of open-source components": >>46649586

Edit: also, this "independent implementation" cannot be compiled by their own CI and doesn't work, apparently.

◧◩◪◨
179. neuron+zf7[view] [source] [discussion] 2026-01-16 21:25:20
>>fwip+r47
Looks like Cursor Agent was at least somewhat involved: https://github.com/wilsonzlin/fastrender/commit/4cc2cb3cf0bd...
◧◩◪◨⬒
180. embedd+Ai7[view] [source] [discussion] 2026-01-16 21:40:50
>>neuron+zf7
Looks like a bunch of different users (including Google's Jules made one commit) been contributing to the codebase, and the recent "fixes" includes switching between various git users. https://gist.github.com/embedding-shapes/d09225180ea3236f180...

This to me seems to raise more questions than it answers.

◧◩
183. snowmo+Px7[view] [source] [discussion] 2026-01-16 23:24:32
>>simonw+35
Well, it doesn't surprise me that this project is just a non-compiling clone of an existing browser. Says a lot about AI in general, don't you think? >>46649046
◧◩◪
190. polygl+k1m[view] [source] [discussion] 2026-01-21 18:06:05
>>wilson+JS6
> there are real complex systems being engineered towards the goal of a browser engine, even if not there yet.

In various comments in >>46624541 I have explained at length why your fleet of autonomous agents failed miserably at building something that could be seen as a valid POC.

One example: your rendering loop does not follow the web specs and makes no sense.

https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

The above design document is simply nonsense; typical AI hallucinated BS. Detailed critique at >>46705625

The actual code is worse; I can only describe it as a tangle of spaghetti. As a Browser expert I can't make much, if anything, out of it. In comparison, when I look at code in Ladybird, a project I am not involved in, I can instantly find my way around the code because I know the web specs.

So I agree this isn't just wiring up of dependencies, and neither is it copied from existing implementations: it's a uniquely bad design that could never support anything resembling a real-world web engine.

Now don't get me wrong, I do think AI could be leveraged to build a web engine, but not by unleashing autonomous agents. You need humans in the loop at all levels of abstractions; the agents should only be used to bang out features re-using patterns established or vetted by human experts.

If you want to do this the right way, get in touch: https://github.com/gterzian

◧◩◪
192. polygl+6am[view] [source] [discussion] 2026-01-21 18:45:11
>>ben_w+fv1
> what matters in the end is what the code does, not what it looks like

That is true in a way, although even for agents readability matters.

But the code here does not actually do the right thing, and the way it is written also means it never could.

Web devs do care whether the engine runs their code according to Web standards(otherwise it's early IE all over), and end-users do care that websites work as their devs intended to.

Current state is throw-away level quality.

I've critiqued it at length in the other post, see >>46705625

◧◩
193. polygl+Jbm[view] [source] [discussion] 2026-01-21 18:53:49
>>sashan+u5
I've done this in the parallel post, see >>46705625 (and a couple of other replies in that thread)

TLDR; the code is not a valid POC but throw-away level quality that could never support a functioning web engine. It's actually very clear hallucinated AI BS, which is what you get when you don't have a human expert in the loop.

I actually like using AI, but only to save me the typing.

◧◩◪◨
194. tocsa+V3x[view] [source] [discussion] 2026-01-25 07:27:47
>>xmprt+W8
Even though several people seconded the complexity of a browser, I must add one more take and bring up one of my all time favorite blog posts, back from 2000 (I am old), when browsers were already complex, Joel Spolsky's Joel On Software episode "Things You Should Never Do, Part I" https://www.joelonsoftware.com/2000/04/06/things-you-should-... His first example was Netscape browser v6.0, and why there wasn't a v5.0 after v4.0, why it took three years: "They did it by making the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch." I think this blog post is very relevant here.
[go to top]