People were making all sorts of statements like: - “I cloned it and there were loads of compiler warnings” - “the commit build success rate was a joke” - “it used 3rd party libs” - “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
This idea that quality doesn't matter is silly. Quality is critical for things to work, scale, and be extensible. By either LLMs or humans.
Am I misunderstanding this metaphor? Tsunamis pull the sea back before making landfall.
I'm sorry but what? Are you really trying to argue that it doesn't matter that nothing works, that all it produced is garbage and that what is really important is that it made that garbage really quickly without human oversight?
That's.....that's not success.
I did read your post, and agree with what you're saying. It would be great if they pushed the agents to favour reliability or reproducibility, instead of just marching forwards.
Not everything needs to, or should have the same quality standards applied to them. For the purposes of the Cursor post, it doesn't bother me that most of the commits produced failed builds. I assume, from their post, that at some points, it was capable of building, and rendering the pages shown in the video on the post. That alone, is the thing that I think is interesting.
Would I use this browser? Absolutely not. Do I trust the code? Not a chance in hell. Is that the point? No.
Sure, I don't care too much if the restaurant serves me food with silverware that is 18/10 vs 18/0 stainless steel, but I absolutely do care if I order a pizza and they just dump a load of gravel onto my plate and tell me it's good enough, and after all, quality isn't the point.
There are very few software development contexts where the quality metric of “does the project build and run at all” doesn’t matter quite a lot.
If the piece of shit can't even compile, it's equivalent to 0 lines of code.
> Don’t look at where the tech is. Look where it’s going.
Given that the people making the tech seem incapable of not lying, that doesn't give me hope for where it's going!
Look, I think AI and LLMs in particular are important. But the people actively developing them do not give me any confidence. And, neither do comments like these. If I wanted to believe that all of this is in vain, I would just talk to people like you.
I can bang on a keyboard for a week and produce tons of text files - but if they don’t do anything useful, would you consider me a programmer?
The reason I have yet to publish a book is not because I can't write words. I got to 120k words or so, but they never felt like the right words.
Nobody's giving me (nor should they give me) a participation trophy for writing 120k words that don't form a satisfying novel.
Same's true here. We all know that LLMs can write a huge quantity of code. Thing is, so does:
yes 'printf("Hello World!");'
The hard part, the entire reason to either be afraid for our careers or thrilled we can switch to something more productive than being code monkeys for yet-another-CRUD-app (depending on how we feel), that's the specific test that this experiment failed at.