My AI skeptic friends are all nuts

>>tablet+(OP)
I think the hardest part is not spending the next 3 months of my life in a cave finishing all the hobby/side projects I didn't quite get across the line.

It really does feel like I've gone from being 1 senior engineer to a team that has a 0.8 Sr. Eng, 5 Jrs. and one dude that spends all his time on digging through poorly documented open source projects and documenting them for the team.

Sure I can't spend quite as much time working on hard problems as I used to, but no one knows that I haven't talked to a PM in months, no one knows I haven't written a commit summary in months, it's just been my AI doppelgangers. Compared to myself a year ago I think I now PERSONALLY write 150% more HARD code than I did before. So maybe, my first statement about being 0.8 is false.

I think of it like electric bikes, there seems to be indication that people with electric assist bikes actually burn more calories/spend more time/go farther on an electric bike than those who have manual bikes https://www.sciencedirect.com/science/article/abs/pii/S22141....

>>Philpa+s3
Are you, though? Reading code is harder, potentially much harder.[1]

And I suspect the act of writing it yourself imparts some lower level knowledge you don't get by skimming the output of an AI.

[1] https://www.joelonsoftware.com/2000/05/26/reading-code-is-li...

>>pie_fl+33
I read a study[1] (caveat, not peer reviewed yet I don't think?) that seems to imply that you are correct.

  < When using GenAI tools, the effort invested in critical thinking
  < shifts from information gathering to information verification; 
  < from problem-solving to AI response integration; and from task
  < execution to task stewardship.

[1]: https://www.microsoft.com/en-us/research/wp-content/uploads/...

>>retrac+J
That's not quite true. State of the art both in speech recognition and translation is still a dedicated model only for this task alone. Although the gap is getting smaller and smaller, and it also heavily depends on who invests how much training budget.

For example, for automatic speech recognition (ASR), see: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

The current best ASR model has 600M params (tiny compared to LLMs, and way faster than any LLM: 3386.02 RTFx vs 62.12 RTFx, much cheaper) and was trained on 120,000h of speech. In comparison, the next best speech LLM (quite close in WER, but slightly worse) has 5.6B params and was trained on 5T tokens, 2.3M speech hours. It has been always like this: With a fraction of the cost, you will get a pure ASR model which still beats every speech LLM.

The same is true for translation models, at least when you have enough training data, so for popular translation pairs.

However, LLMs are obviously more powerful in what they can do despite just speech recognition or translation.

>>mjburg+q4
Not exactly what you're asking for, but >>44159166 from today is not a junior web programmer working through the backlog, and the commit history contains all the prompts.

>>albert+R4
It's not the speech recognition model alone that's fantastic. It's coupling it to an LLM for cleanup that makes all the difference.

See https://blog.nawaz.org/posts/2023/Dec/cleaning-up-speech-rec...

(This is not the best example as I gave it free rein to modify the text - I should post a followup that has an example closer to a typical use of speech recognition).

Without that extra cleanup, Whisper is simply not good enough.

>>gdubs+Z
>simple fact that you can now be fuzzy with the input you give a computer, and get something meaningful in return

I got into this profession precisely because I wanted to give precise instructions to a machine and get exactly what I want. Worth reading Dijkstra, who anticipated this, and the foolishness of it, half a century ago

"Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. (This was evidently not understood by the author that wrote —in 1977— in the preface of a technical report that "even the standard symbols used for logical connectives have been avoided for the sake of clarity". The occurrence of that sentence suggests that the author's misunderstanding is not confined to him alone.) When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.[...]

It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable"

Welcome to prompt engineering and vibe coding in 2025, where you have to argue with your computer to produce a formal language, that we invented in the first place so as to not have to argue in imprecise language

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

>>pie_fl+33
Exactly. See also https://hazelweakly.me/blog/stop-building-ai-tools-backwards... for a detailed look at this aspect of AI coding.

>>tablet+(OP)
> Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.

Seriously? Is this argument in all earnestly "No profession has been more contemptuous therefore we should keep on keeping on"? Should we as an industry not bother to try and improve our ethics? Why don't we all just make munitions for a living and wash our hands of guilt because "the industry was always like this".

Seems a bit ironic against the backdrop of <https://news.ycombinator.com/user?id=tptacek>:

(although perhaps this is tongue-in-cheek given the last year)

>>tablet+(OP)
The author posits that people don't like using LLMs with Rust because LLMs aren't good with Rust. Then people would migrate towards languages that do will with LLMs. However, if that were true, then Julia would be more popular since LLMs do very well with it: https://www.stochasticlifestyle.com/chatgpt-performs-better-...

>>capnre+15
> So where will the experts come from?

They won't, save for a relative minority of those who enjoy doing things the hard way or those who see an emerging market they can capitalize on (slop scrubbers).

I wrote this post [1] last month to share my concerns about this exact problem. It's not that using AI is bad necessarily (I do every day), but it disincentivizes real learning and competency. And once using AI is normalized to the point where true learning (not just outcome seeking) becomes optional, all hell will break loose.

> Perhaps there is another way to develop the skills

Like sticking a fork in a light socket, the only way to truly learn is to try it and see what happens.

[1] https://ryanglover.net/blog/chauffeur-knowledge-and-the-impe...

>>retrac+J
Definitely not. I took this same basic idea of feeding videos into Whisper to get SRT subtitles and took it a step further to make automatic Anki flashcards for listening practice in foreign languages [1]. I literally feel like I'm living in the future every time I run across one of those cards from whatever silly Finnish video I found on YouTube pops up in my queue.

These models have made it possible to robustly practice all 4 quadrants of language learning for most common languages using nothing but a computer, not just passive reading. Whisper is directly responsible for 2 of those quadrants, listening and speaking. LLMs are responsible for writing [2]. We absolutely live in the future.

[1]: https://github.com/hiandrewquinn/audio2anki

[2]: https://hiandrewquinn.github.io/til-site/posts/llm-tutored-w...

>>postal+w6
I have bad news about our illustrious hosts: https://www.ycombinator.com/companies?query=web3

They're not alone either, a bunch of the AI bankroll is coming from people who were also sold on crypto taking over the world.

>>tablet+(OP)
> This was the craftsman’s ‘Golden Age’ and much time and trouble was taken over the design of tools. Craftsmen were being called upon to do more skilful and exacting work and the use of tools and the interest in development had become very widespread.

Above pulled from A Brief History of the Woodworking Plane [0]. A woodworking tool that has evolved over 2,000 years. Now there are electric planers, handheld electric planers and lots of heavy machinery that do the same thing in a very automated way. If a company is mass producing kitchen cabinets, they aren't hand planing edges on boards, a machine is doing all that work.

I feel like with AI we are on the cusp of moving beyond a "Golden age" and into an "industrial age" for coding, where it will become more important to have code that AI understands vs. something that is carefully crafted. Simple business pressure will demand it (whether we like it or not).

^ A comment I made just yesterday on a different thread.

For software developers AI is like the cabinet maker that gets a machine to properly mill and produce cabinet panels, sure you can use a hand plane to do that but you're producing a very different product and likely one that not many people will care about, possibly not even your employer when they see all the other wood shops pumping out cabinetry and taking their market share.

[0] https://www.handplane.com/879/a-brief-history-of-the-woodwor...

>>pants2+Ba
Do you have any comparisons in terms of WER? I doubt that GPT-4o-transcribe is better than the best models from that leaderboard (https://huggingface.co/spaces/hf-audio/open_asr_leaderboard). A quick search on this got me here: https://www.reddit.com/r/OpenAI/comments/1jvdqty/gpt4otransc... https://scribewave.com/blog/openai-launches-gpt-4o-transcrib...

It is stated that GPT-4o-transcribe is better than Whisper-large. That might be true, but what version of Whisper-large actually exactly? Looking at the leaderboard, there are a lot of Whisper variants. But anyway, the best Whisper variant, CrisperWhisper, is currently only at rank 5. (I assume GPT-4o-transcribe was not compared to that but to some other Whisper model.)

It is stated that Scribe v1 from elevenlabs is better than GPT-4o-transcribe. In the leaderboard, Scribe v1 is also only at rank 6.

>>dmonit+b2
And of course, a lot of modern "dvd players" do not properly transmit closed captions as subtitles over HDMI, so that sure isn't helping

A slightly off topic but interesting video about this https://www.youtube.com/watch?v=OSCOQ6vnLwU

>>tablet+(OP)
Curious how he reconciles this:

> If you build something with an LLM that people will depend on, read the code. In fact, you’ll probably do more than that. You’ll spend 5-10 minutes knocking it back into your own style.

with Joel Spolsky's fundamental maxim:

> It’s harder to read code than to write it.

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

>>okanat+s8
"Nobody writes articles or does deep knowledge work with smartphones."

I don't think that's true.

I do most of my reading on a smart phone - including wading through academic papers, or reading full books in the kindle app and jotting down notes in the digital margins.

A sizable number of my short form blog entries are written on my phone, and my long form writing almost always starts out in Apple Notes on my phone before transferring to a laptop.

Predictive text and voice dictation has got good enough now that I suspect there have been entire books written on mobile devices.

Whether you want to consider it "deep knowledge work" or not is up to you, but apparently a lot of Fifty Shades of Grey was written on a BlackBerry! https://www.huffpost.com/archive/ca/entry/fifty-shades-of-gr...

>>grey-a+ba
Approximately speaking, what do you want to see put up?

I ask this because it reads like you have a specific challenge in mind when it comes to generative AI and it sounds like anything short of "proof of the unlimited powers" will fall short of being deemed "useful".

Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea. They're not even suggesting it should be used without supervision. It's useful when used by people who understand its limitations and leverage its strengths.

If you want to see how it's been used by someone who was happy with the results, and is willing to share their results, you can scroll down a few stories on the front-page and check the commit history of this project:

https://github.com/cloudflare/workers-oauth-provider/commits...

Now here's the deal: These people aren't trying to prove anything to you. They're just sharing the results of an experiment where a very talented developer used these tools to build something useful.

So let me ask you this: Can we at least agree that these tools can be of some use to talented developers?

>>grey-a+ba
I think we're talking past each other. There's always been a threshold: above it, code changes are worth the effort; below it, they sit in backlog purgatory. AI tools so far seem to lower implementation costs, moving the threshold down so more backlog items become viable. The "5x productivity" crowd is excited about this expanded scope, while skeptics correctly note the highest value work hasn't fundamentally changed.

I think what's happening is two groups using "productivity" to mean completely different things: "I can implement 5x more code changes" vs "I generate 5x more business value." Both experiences are real, but they're not the same thing.

https://peoplesgrocers.com/en/writing/ai-productivity-parado...

>>dmonit+b2
Many DVDs of old movies and TV shows may contain the closed captions, but they are not visible through HDMI. You have to connect your DVD player to your TV via the composite video analogue outputs.

This video explains all about it: https://youtu.be/OSCOQ6vnLwU

>>dimal+Fb
> Well, if you’re a novice, don’t do that.

I agree, and it sounds like you're getting great results, but they're all going to do it. Ask anyone who grades their homework.

Heck, it's even common among expert users. Here's a study that interviewed scientists who use LLMs to assist with tasks in their research: https://doi.org/10.1145/3706598.3713668

Only a few interviewees said they read the code through to verify it does what they intend. The most common strategy was to just run the code and see if it appears to do the right thing, then declare victory. Scientific codebases rarely have unit tests, so this was purely a visual inspection of output, not any kind of verification.

>>mjburg+q4
Here's a 3+h video of the PSPDFKit (Nutrient) founder vibe-coding a Mac app. Can be watched at 2x: https://steipete.me/posts/2025/the-future-of-vibe-coding?utm...

>>mtklei+Ec
> I snapped a photo of a bistro's daily specials hand-written on a blackboard in Chinese, copied the text right out of the photo, translated it into English, learned how to pronounce the menu item I wanted, and ordered some dinner.

> Two years ago

This functionality was available in 2014, on either an iPhone or android. I ordered specials in Taipei way before Covid. Here's the blog post celebrating it:

https://blog.google/products/translate/one-billion-installs/

This is all a post about AI, hype, and skepticism. In my childhood sci-fi, the idea of people working multiple jobs to still not be able to afford rent was written as shocking or seen as dystopian. All this incredible technology is a double edges sword, but doesn't solve the problems of the day, only the problems of business efficiency, which exacerbates the problems of the day.

>>grey-a+ba
This 2 year old Goroutine pool implementation [1] is 95% GPT generated and has commit history showing what GPT did. It's an older example, but it is one.

[1]: https://github.com/devchat-ai/gopool

>>thousa+z2
Programmers who think front end is "easier" than backend have been wrong for well over a decade. https://simonwillison.net/2012/Feb/13/why-are-front-end/

>>grey-a+ba
Maybe they're just busy shipping code instead of proving stuff to you, which they stand nothing to gain from?

You set up a strawman (AI only companies, agents doing everything on their own) which is irrelevant to the point the article is making. One excerpt:

> Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.

I think this article is very on point, I relate with basically every paragraph. It's not a panacea, it's not a 10x improvement by any means, but it's a very meaningful improvement to both productivity (less than 2x I'd say, which would already be a ton) and fun for me. As I've mentioned in the past here

> I feel like there’s also a meaningful split of software engineers into those who primarily enjoy the process of crafting code itself, and those that primarily enjoy building stuff, treating the code more as a means to an end (even if they enjoy the process of writing code!). The former will likely not have fun with AI, and will likely be increasingly less happy with how all of this evolves over time. The latter I expect are and will mostly be elated.

which is a point the article makes too (tables), in a slightly different way.

Also, to be clear, I agree that 90% of the marketing around AI is overblown BS. But that's again beside the point, and the article is making no outlandish claims of that kind.

Overall, I hope this article (as intended) will make more people lose their dismissiveness and wake up their curiosity, as I expect the future of those is akin to that of people today saying they're "not really good at computers". It's a paradigm-shift, and it takes getting used to and productive in, as some imo smart people are mentioning even in this thread[0].

[0]: >>44164039

>>tablet+(OP)
These systems do work, quite well. Especially the latest release of Claude. Palantir is using AI to process your IRS information without your consent, https://archive.is/QAY9E. This should concern everybody. Not only can this information be used to determine specific account details. They can use this to target individuals which don't align with the current administration's agenda, as well exclude hiring base of political alignment. They could also use this data to exclude and discriminate those against Peter Theil's network state agenda aka "Freedom Cities", but it seems everybody is complacent with that.

>>jleyan+Ph
Most of the big AI vendors have liability policies in place where their lawyers are committed to step in and help you out if someone sues you for copyright infringement based on code their LLMs wrote for you. I think the term they use is "copyright shield".

OpenAI has that as point 14 in https://openai.com/policies/services-agreement/

Anthropic mention theirs here: https://www.anthropic.com/news/expanded-legal-protections-ap...

Here's the Google post: https://cloud.google.com/blog/products/ai-machine-learning/p...

>>grey-a+ba
I build tools all the time using AI tools that would have taken me 10x longer before. A couple examples below. The first example was a literal one shot of my dumping the full page html into cursor and asking it for the feature I needed. In using it today it already saved me over an hour of analysis time across the refunds I analyze. The second example includes using AI for better classifying text and having our support people better use their time to help customers. I could have probably made the second thing without LLMs but I didn’t have the skills. Here I was able to provide concrete value to our company and our customers.

https://github.com/Atlas-Authority/mpac-ui-improved https://moduscreate.com/blog/forum-monitoring-is-essential-b... (Pardon how marketing keyword stuffed the final post)

>>liveon+kk
I think tptacek is generally worth reading. He is one of the users with the highest karma on this site (https://news.ycombinator.com/user?id=tptacek)

I’m happy to have read this, which is reason enough to publish it - but also it’s clearly generating debate so it seems like a very good thing to have published.

>>tablet+(OP)
I'm a recent convert. I _was_ a sceptic until I wasn't (>>44059847 ).

>>tablet+(OP)
Hmm a bit of a tone shift from: https://fly.io/docs/about/open-source/

> Fly.io builds on the work of a tremendous open source infrastructure community. We want open source authors to benefit from their work because we think a healthy, thriving open source ecosystem will help us build better products.

To the fine article:

> Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass.

>>grey-a+ba
Re put up or shut up - this is a decent example from the front page today: >>44159166

>>malfis+Y5
What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

>>GoatIn+Wo
Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.

There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.

0 - >>44164633

>>mjburg+47
I watched someone do a similar demonstration( >>44159166 )live at an event. They ended up doing something like 3 pull requests to get the original change. Then had to do 4 more to get it to fix and put back things it removed. Not exactly efficient, and it was painful to sit there and be like I could have had it done manually 20x by now while we painfully waited for the AI to do the changes.

I've never been able to get it to work reliably myself either.

The internet just tells me to prompt harder. Lots of "grind-set" mentality energy around AI if you ask me. Very little substance.

>>jedber+Hi
For introductory problems, the kind we use to get students to understand a concept for the first time, the AI would likely (nearly) nail it on the first try. They wouldn't have to fix any non-working code. And annotating the code likely doesn't serve the same pedagogical purpose as writing it yourself.

Students emerge from lectures with a bunch of vague, partly contradictory, partly incorrect ideas in their head. They generally aren't aware of this and think the lecture "made sense." Then they start the homework and find they must translate those vague ideas into extremely precise code so the computer can do it -- forcing them to realize they do not understand, and forcing them to make the vague understanding concrete.

If they ask an AI to write the code for them, they don't do that. Annotating has some value, but it does not give them the experience of seeing their vague understanding run headlong into reality.

I'd expect the result to be more like what happens when you show demonstrations to students in physics classes. The demonstration is supposed to illustrate some physics concept, but studies measuring whether that improves student understanding have found no effect: https://doi.org/10.1119/1.1707018

What works is asking students to make a prediction of the demonstration's results first, then show them. Then they realize whether their understanding is right or wrong, and can ask questions to correct it.

Post-hoc rationalizing an LLM's code is like post-hoc rationalizing a physics demo. It does not test the students' internal understanding in the same way as writing the code, or predicting the results of a demo.

>>tablet+(OP)
I find the Konwinski Prize to be very interesting in this context. 1 million dollars to whoever's open source LLM solves >90% of a set of novel Github issues.

https://www.kaggle.com/competitions/konwinski-prize/

Currently, the #1 spot sits at a score of 0.09, not 0.9. A far cry from being useful. I know that open source models are not as good as closed source, but still, we're a long way from LLMs being good for code on their own.

And that supports OP's point - these tools aren't AGI, they produce trash that needs evaluation, but they're still useful.

>>XorNot+rc
It's a bit more than a metaphor :) during the California gold rush there was this guy named Sam Brannan who sold shovels and other tools to miners, and made a fortune from it (he is often referred to as California's first millionaire). He also had a newspaper at the time, the California Star, which as you can imagine was used to promote the gold rush:

> The excitement and enthusiasm of Gold Washing still continues—increases. (1848)

https://sfmuseum.org/hist6/star.html

https://en.wikipedia.org/wiki/Samuel_Brannan

>>espere+Lo
tptacek is using Zed, which I've not tried myself.

I actually do most of my "agentic coding" (not a fan of the term, but whatever) in ChatGPT Code Interpreter, which hasn't changed much in two years other than massive upgrades to the model it uses - I run that mainly via o4-mini-high or o3 these days.

OpenAI's Codex is a leading new thing, but only if you pay $200/month for it. Google's equivalent https://jules.google/ is currently free.

GitHub Copilot gained an "agent mode" recently: https://github.blog/ai-and-ml/github-copilot/agent-mode-101-...

There's also Copilot Coding Agent, which is confusingly an entirely different product: https://github.blog/changelog/2025-05-19-github-copilot-codi...

>>TheRoq+mp
"they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient"

That's mostly solved by the most recent ones that can run searches. I've had great results from o4-mini for this, since it can search for the latest updates - example here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/#la...

Or for a lot of libraries you can dump the ENTIRE latest version into the prompt - I do this a lot with the Google Gemini 2.5 models since those can handle up to 1m tokens of input.

"they fail at doing clean DRY practices" - tell them to DRY in your prompt.

"they bait me into inexisting apis, or hallucinate solutions or issues" - really not an issue if you're actually testing your code! I wrote about that one here: https://simonwillison.net/2025/Mar/2/hallucinations-in-code/ - and if you're using one of the systems that runs your code for you (as promoted in tptacek's post) it will spot and fix these without you even needing to intervene.

"they cannot properly pick the context and the files to read in a mid-size app" - try Claude Code. It has a whole mechanism dedicated to doing just that, I reverse-engineered it this morning: https://simonwillison.net/2025/Jun/2/claude-trace/

"they suggest to download some random packages, sometimes low quality ones, or unmaintained ones" - yes, they absolutely do that. You need to maintain editorial control over what dependencies you add.

>>Fridge+q3
> Is there a term for “skeptics just haven’t used it enough” argument?

It's not an exact match to what you want, but "you're holding it wrong" is the closest I've found. (For those too young to have heard of it, it was an infamous rebuttal to criticism of a particular model of the iPhone: https://en.wikipedia.org/wiki/iPhone_4#Antenna)

>>poinca+qr
I've never heard of Ghidra before but, in case you're interested, I ran that prompt through OpenAI's o3 and Anthropic's Claude Opus 4 for you just now (both of them the latest/greatest models from those vendors and new as of less than six months ago) - results here: https://chatgpt.com/share/683e3e38-cfd0-8006-9e49-2aa799dac4... and https://claude.ai/share/7a076ca1-0dee-4b32-9c82-8a5fd3beb967

I have no way of evaluating these myself so they might just be garbage slop.

>>palata+4k
Apparently 1/3 lb hamburgers didn't help A&W against McDonald's because too many people thought 1/3 is smaller than 1/4. So the Quarter Pounder remains supreme. Snopes: [https://www.snopes.com/news/2022/06/17/third-pound-burger-fr...]

>>tablet+(OP)
A lot of the doubters are basically saying: "If it's so great, show me proof." While that is usually the right stance, I think it is short-sighted in times of revolutionary change.

To be sure, you definitely should not blindly trust the people who have a stake in selling AI products. However, you probably should trust the people who have no stake in selling AI, but are using it to accelerate themselves. The problem is, these people are probably too busy building things to spare any time convincing you. And in fact, it is a competitive advantage for them if others don't figure it out.

Here's my take: Everyone needs to figure out how AI works for them. The thing with AI is that it is a technology that behaves like people. It has the speed, efficiency and scale of computers but the fallibility and quirks of humans. This is why it's so confusing to discuss, and why folks have such varying experiences. The right approach to working with AI is like working with people: understand what they're good at and where they are weak, and then work with them to achieve your goals.

This will require using it "in anger" on many non-trivial tasks over a significant period of time, which will take persistence and patience. Yes, the hype is so over-the-top that you might not want to invest your valuable time on it. But you owe it to yourself to ensure you can fully capitalize on the sea change that is happening.

If it helps, I (like the OP) have no stake in selling AI and I posted this comment about my experiences on the other thread about the AI coding -- >>44163543 -- The upshot is, AI has enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. And I've not even used agents yet!

>>sneak+t
I hate how the discourse around LLM-assisted programing is so polarized. It's either detractors saying it's "a fad that's useless and going nowhere, wasting billions of megawatts every year" or it's true believers calling it "the most miraculous sea change technological advancement in my lifetime" or "more important than fire and electricity[1]." There just doesn't seem to be any room in the middle.

I tried out Copilot a few months back to see what all the fuss was about and so that I could credibly engage with discussions having actually used the technology. I'd rate it as "kind of neat-o" but not earth shattering. It was like the first time I used an IDE with auto-complete. Oh, cool, nice feature. Would I pay monthly for it? No way. Would I integrate it into my development workflow if it were free? Maybe, I guess? Probably wouldn't bother unless it came literally set up for me out of the box like autocomplete does nowadays.

Don't get me wrong--it's cool technology. Well done, AI people. Is it "the 2nd most important thing to happen over the course of my career" as OP wrote? Come on, let's come down to earth a little.

1: https://www.cnbc.com/2018/02/01/google-ceo-sundar-pichai-ai-...

>>jppitt+0s
Yeah, definitely both.

New models come out all the time. One of the most interesting signals to look out for is when they tip over the quality boundary from "not useful at task X" to "useful at task X". It happened for coding about a year ago. It happened for search-based research assistants just two months ago, in my opinion - I wrote about that here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/

>>esjeon+Is
That's because GPT-4o's training cut-off is Sep 30, 2023 (see https://platform.openai.com/docs/models/gpt-4o) and the OpenAI API has changed a LOT since then.

Claude 4 has a training cut-off of March 2025, I tried something today about its own API and it gave me useful code.

>>metall+D1
Zed has a great four minute demo showing how it works: https://zed.dev/agentic

I personally have my Zed set up so the agent has to request every command be manually reviewed and approved before running.

>>lapcat+nt
> How does that work exactly? Do you have a link?

https://ollama.com lets you run models on your own hardware and serve them over a network. The you point your editor at that server, eg https://zed.dev/docs/ai/configuration#ollama

>>rainsf+Fr
Using map reading skills as a proxy for this is a bit of a strawman. People who use GPS habitually have worse navigational and spatial awareness skills.

https://www.nature.com/articles/s41598-020-62877-0

If you habitually use a calculator for all arithmetic, could the result not be similar? What if you reach to an LLM for all your coding, general research, etc.? These tools may vastly speed up some workflows, but your brain is a muscle.

>>metalt+h7
Well no, the reality of this workflow is the farcica, abject failures unleashed on the Dotnet codebase a week ago.

>>44050152

>>phkahl+Fw
They do. All over the place.

Simon Willison has some great examples in his blog and on his GitHub. Check out Karpathy’s YouTube videos as well.

https://simonwillison.net/

>>taurat+Yf
It was available as early as 2012, probably earlier as IIRC Microsoft was copying:

https://www.pcworld.com/article/470008/bing_translator_app_g...

>>phkahl+Fw
This one's been on the HN front page all day:

>>44159166

>>ethagn+ay
I can recommend govim, it's using LSP behind the scenes. I'd also recommend tagbar because it uses the newer ctags fork which supports much more languages.

[1] https://github.com/govim/govim

[2] https://github.com/preservim/tagbar

>>tablet+(OP)
Both off and on topic: I got access to Claude’s voice mode today, and for my second test I asked it for its thoughts about both the linked blog post and the HN discussion here. The full transcript is at [1]. Some parts that stood out (slightly edited):

Me: So what is your assessment of the discussion at Hacker News?

Claude: The Hacker News discussion strikes me as much more grounded than the blog post itself. What I find compelling is that many commenters are sharing specific technical experiences rather than making broad ideological arguments. When someone describes spending hours fighting with Claude agents that keep refactoring code into spaghetti, that's concrete data about the current state of these tools. ...

Me: Well, what about you, Claude? You are made by a company whose president has recently made some public statements saying that, you know, 50% of white collar jobs will be replaced soon. And the company, Anthropic, also is very aggressively developing and selling tools to help with AI coding. What are your thoughts about that?

Claude: That's a really fair question to ask me directly. I think there's an obvious tension in my position here since I'm built by a company that has clear financial incentives around AI adoption, including coding tools. When I look at my own capabilities honestly, I can see both sides of this. I'm genuinely useful for certain coding tasks like explaining unfamiliar code, generating boilerplate, or helping debug specific problems. I've helped people learn new programming languages faster than they might have otherwise. But I'm also clearly not at the level where I could replace an experienced software engineer working on complex system design or novel problem solving. ...

[1] https://claude.ai/share/c934e67c-f132-43e8-8c31-ab238b5be25a

>>tablet+(OP)
To quote an excellent article from last week:

> The AI has suggested a solution, but the added code is arguably useless or wrong. There is a huge decision space to consider, but the AI tool has picked one set of decisions, without any rationale for this decision.

> [...]

> Programming is about lots of decisions, large and small. Architecture decisions. Data validation decisions. Button color decisions.

> Some decisions are inconsequential and can be safely outsourced. There is indeed a ton of boilerplate involved in software development, and writing boilerplate-heavy code involves near zero decisions.

> But other decisions do matter.

(from https://lukasatkinson.de/2025/net-negative-cursor/)

Proponents of AI coding often talk about boilerplate as if that's what we spend most of our time on, but boilerplate is a cinch. You copy/paste, change a few fields, and maybe run a macro on it. Or you abstract it away entirely. As for the "agent" thing, typing git fetch, git commit, git rebase takes up even less of my time than boilerplate.

Most of what we write is not highly creative, but it is load-bearing, and it's full of choices. Most of our time is spent making those choices, not typing out the words. The problem isn't hallucination, it's the plain bad code that I'm going to have to rewrite. Why not just write it right myself the first time? People say "it's like a junior developer," but do they have any idea how much time I've spent trying to coax junior developers into doing things the right way rather than just doing them myself? I don't want to waste time mentoring my tools.

>>protoc+RC
Well this is actually happening in Japanese Animation and the result is that no young talents are getting trained in the workforce. [1]

But unlike animation, where the demand for the art can just disappear. I don't think the demand for software engineer will disappear. Same thing with musician. Young engineers might just be jobless or on training mode for much longer period of time before they can make actual living money.

Good thing is, as far as I know, Kyoto Animation managed to avoid this issue by having in-house training, growing their own talent pools.

[1]: https://blog.sakugabooru.com/2023/03/31/the-long-quest-to-fi...

>>meande+my
> Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May)

This is incorrect. The library was part of the MCP framework we launched on March 25 -- the same month development began:

https://blog.cloudflare.com/remote-model-context-protocol-se...

Indeed the speed with which we were able to turn this around was critical to us, as it allowed us to have our Remote MCP framework ready immediately when the spec was finalized, which led to quite a few companies building MCP servers on Cloudflare: https://blog.cloudflare.com/mcp-demo-day/

I'm not an AI maximalist. I still write lots of code by hand, because there's a lot AI isn't good at. It's good at boilerplate and straightforward code, it's bad at refactoring deep systems. But AI assistance was undeniably a huge win for the OAuth project. There's no way I could have written that library by hand so quickly. (Maybe when I was 25 and had no responsibilities, but these days I have like 1 solid day a week to actually write code...)

>>thombl+rn
It's rather shocking for a company to admit to flaunting software licensing in public communications.

But this is not new https://fly.io/blog/wrong-about-gpu/

> At one point, we hex-edited the [NVIDIA] closed-source drivers to trick them into thinking our hypervisor was QEMU.

>>threeP+Yk
I haven't used zed specifically, but were you using your own API key for Claude? If so, you were probably running into the Anthropic API rate limits [1]. You can either a) deposit more funds to move your way up the tier list, or instead access Claude via something like OpenRouter, which will give you much higher limits.

[1] https://docs.anthropic.com/en/api/rate-limits

>>whatth+3G
> What have you written with ai that ha made you or your business money

I use R a little more than I should, given the simplicity of my work. Claude writes better R quicker than I can. I double check what it's doing. But it's easier to double check it used twang correctly than spend five trying to remember how to use the weird package that does propensity scoring [1].

I'm sure data analysis will still sort of be a thing. But it's just not as useful anymore in the form of a human being for most commercial applications at sub-enterprise scale.

[1] https://cran.r-project.org/web/packages/twang/index.html

>>tablet+(OP)
https://www.nytimes.com/2025/06/02/opinion/ai-coders-jobs.ht...

>>ethagn+ay
I have https://github.com/prabirshrestha/vim-lsp in my minimal config and pretty much the default configuration for it. I only use LSPs for completion, auto imports, rename, and goto definition (less useful when the project is small) and that has been enough for me.

>>Animat+1C
https://gist.github.com/cpsquonk/e9a6134e78a2c832161ca973803...

I did Qwen3-256B (a free model, but you'd need a host for something that large, probably. I used Kagi) and Claude Code.

Curious how these look to you.

>>mrbung+bf
Just for fun, I asked ChatGPT and it came up with 30+ fallacies + examples. I'm sure some is hallucinated, but woof:

https://chatgpt.com/share/683e62ed-e118-800f-a404-bd49bec799...

>>bccdee+qC
OP: https://fly.io/blog/youre-all-nuts/#but-its-bad-at-rust

> (from https://lukasatkinson.de/2025/net-negative-cursor/)

looks inside

complaining about Rust code

>>Animat+fJ
I reopened Claude, and asked "Can you use the "glam" crate for the vectors, instead of writing out things like vector length the long way?"

https://gist.github.com/cpsquonk/348009eb7c83a7d499ff5ae70d7...

>>DebtDe+V9
things are way beyond hype at this point and calling skeptics who raised the alarms along the way nuts is nuts. I'll also leave this here: https://www.youtube.com/watch?v=6ovuMoW2EGk

>>sho_hn+Hr
The sequence in question: https://www.youtube.com/watch?v=6CDhEwhOm44&t=710s

>>dnober+o9
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."

https://news.ycombinator.com/newsguidelines.html

>>donatj+rb
"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

>>cookie+4x
Why do you want to selfhost it? I just use https://github.com/Exafunction/windsurf.vim in NeoVim like a super-autocomplete.

>>pie_fl+33
Talking to an LLM feels like talking to Leonard Shelby from memento. https://youtube.com/watch?v=Y3kNTvXVHvQ

>>tablet+(OP)
"You don’t believe in IPR. Then shut the fuck up about IPR."

As a software engineer, I need to call out this rhetoric. We are not all morally bankrupt. Many of the people with the most ethical backbone I know are also software engineers.

I don't use "AI" for the same reason I don't use amphetamine. I'm sure I could get stuff more stuff done faster if I used it, but I don't, because the manufacturing and the distribution of it is completely unethical.

The blatant stealing of intellectual property is only the tip of the shitberg. As bad are the giant ecological footprint[1], the colonialism of the grossly underpaid RLHF, and the conceding of even more of our digital lives to trillion dollar corporations.

We can and must ethically evaluate our work, and that includes our tools. Nobody's perfect, but doing one immoral thing does not mean you get a free pass to skip morality altogether.

1. https://www.greenpeace.de/publikationen/environmental-impact...

>>Animat+xA
Tackling 1: No current LLM will be able to do that. Create your Rust project, figure out the dependencies that could be useful on your own, then start by asking which types can be expressed. Then after you have revised the types, figure out (with or without LLM) which functions have to be implemented. Then ask LLMs to implement these functions with tests. With some modifications, you should be able to have a final working solution.

Edit: took a while but here is the code generated by DeepSeek: https://gist.github.com/omarabid/b30195011abf570fde2a084cd11... If you have an example file I could give it a test run.

>>habosa+VM
> It will power scams on an unimaginable scale

It already is. https://futurism.com/slop-farmer-ai-social-media

And all the other things you predicted. They're underway _now_ .

> Maybe if there are more of me, things will slow down enough

Nope. That's not how it's gonna work. If you want to prevent things, it will take legislation. But sitting it out doesn't send any message at all. No amount of butterflies farting against the wind is going to stop this tornado.

>>habosa+VM

  > Because if I’m not right then I am convinced that AI is going to be a force for evil. It will power scams on an unimaginable scale.

It doesn't need to be a good coder to do that.

Look at common scams. You get those texts from "Jane" who sent you an iMessage from an email address offering you a part time job and asks you to contact them on WhatsApp, right? (Well... Android does a better job at filtering spam) Or calls from "the IRS". Or anything else that's bullshit. This even includes legal scams like charging you for canceling your service or making it hard to cancel your subscription.

There's no skill needed for this. You don't need a brilliant coder. You need the equivalent of a call center in India. You need the equivalent of a poorly worded Nigerian scam email.

Shitty coding LLMs make this shit easier to mass produce. High quality LLMs only make it worse.

Personally, I'm just tired of all the shitty lemons[0] everywhere. I wanna buy a peach, but everything being sold is a lemon. All the lemons have done is make me annoyed and frustrated at all the extra work I have to do.

I now have 4 entries for my GF's birthday because when I merged a duplicated contact it just created more instances. I can't even delete them! Shit like this sounds petty and minor but when you're hitting 20-100 new issues like this daily, it isn't so minor anymore. I can't make any more lemonaid. There are just too many lemons...

[0] https://en.wikipedia.org/wiki/The_Market_for_Lemons

>>philom+5R
And Gemini is free. https://aistudio.google.com/ gets you free access to their best models.

OpenAI and Anthropic both have free plans as well.

>>dymk+jG
It’s not the OP’s issue. That reading code is harder than writing it is as close to a consensus belief as we get in this industry.

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

>>baobun+eQ
I doubt I had much influence at all on the spread of vibe-coding.

I stand by what I wrote about it though: https://simonwillison.net/2025/Mar/19/vibe-coding/

I think it's a net positive for regular humans to be able to build tools for their own personal use, and I think my section on "when is it OK to vibe code?" (only for low stakes projects, treat with extreme caution if private data or security is involved) is something I wish people had paid more attention to! https://simonwillison.net/2025/Mar/19/vibe-coding/#when-is-i...

>>mindwo+TN
No, YOU misunderstand. This isn't a thing RL can fix

  https://news.ycombinator.com/item?id=44163194

  https://news.ycombinator.com/item?id=44068943

It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?

Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.

>>pj_muk+FR
> Research we published earlier this year showed that 60% of participants fell victim to artificial intelligence (AI)-automated phishing, which is comparable to the success rates of non-AI-phishing messages created by human experts. Perhaps even more worryingly, our new research demonstrates that the entire phishing process can be automated using LLMs, which reduces the costs of phishing attacks by more than 95% while achieving equal or greater success rates

Bruce Scheneir, May 2024

https://www.schneier.com/academic/archives/2024/06/ai-will-i...

I am seeing a stream of comments on Reddit that are entirely ai driven, and even bots which are engaging in conversations. Worst case scenarios I’m looking at will mean it’s better to assume everyone online is a bot.

I know of cases where people have been duped into buying stocks because of an AI generated version of a publicly known VP of a financial firm.

Then there’s the case where someone didn’t follow email hygiene and got into a zoom call with what appeared to be their CFO and team members, and transferred several million dollars out of the firm.

And it’s only 2-3 years into this lovely process. The future is so bleak that just talking about this with people not involved with looking at these things call it nihilism.

It’s so bad that talking about it is like punching hope.

>>tablet+(OP)
I'm mostly skeptical about AI capabilities but I also think it will never be a profitable business. Let's not forget AI companies need to recoup a trillion dollars (so far) just to break even [1].

VCs are already doubting if the billions invested into data centers are going to generate a profit [1 and 2].

AI companies will need to generate profits at some point. Would people still be optimistic about Claude etc if they had to pay say $500 per month to use it given its current capabilities? Probably not.

So far the only company generating real profits out of AI is Nvidia.

[1] https://www.goldmansachs.com/insights/articles/will-the-1-tr...

[2] https://www.nytimes.com/2025/06/02/business/ai-data-centers-...

>>simonw+Kf
Plug for Simon's (very well written) longer form article about this topic: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

>>honest+aR
> They never mention the "skeptics" that are considered that because they aren't skeptical of what AI is and could be capable.

This is because most people on HN who say they are skeptical about AI mean skeptical of AI capabilities. This is usually paired with statements that AI is "hitting a wall." See e.g.

> I'm very skeptical. I see all the hype, listen to people say it's 2 more years until coding is fully automated but it's hard for me to believe seeing how the current models get stuck and have severe limitations despite a lot of impressive things it can do. [>>44015865 ]

> As someone who is mildly skeptical of the current wave of LLM hype and thinks it's hitting a wall... [>>43634169 ]

(that was what I found with about 30 seconds of searching. I could probably find dozens of examples of this with more time)

I think software developers need to urgently think about the consequences of what you're saying, namely what happens if the capabilities that AI companies are saying are coming actually do materialize soon? What would that mean for society? Would that be good, would that be bad? Would that be catastrophic? How crazy do things get?

Or put it more bluntly, "if AI really goes crazy, what kind of future do you want to fight for?"

Pushing back on the wave because you take AI capabilities seriously is exactly what more developers should be doing. But dismissing AI as an AI skeptic who's skeptical of capabilities is a great way to cede the ground on actually shaping where things go for the better.

>>_Alger+7U
At least in my experience, this is ahistorical. Personal computing in the 1970s and 1980s lived outside of academia, as did bulletin boards. The productive, creative, and empowering elements of the Internet and the Web were subversive actions that existed -- and in some cases were barely tolerated -- within its academic usage.

You say "there is very little you and I can do about it". Even if you don't listen to me, perhaps you might listen to the coiner of the term "enshittification"? https://archive.is/CqA8w

>>mindwo+iY

  > to an abstract academic definition here

Besides the fact that your statement is self contradicting, there is actually a solid definition [0]. You should click the link on specification too. Or better yet, go talk to one of those guys that did their PhD in programming languages.

  > They have proven

Have they?

Or did you just assume?

Yeah, I know they got good scores on those benchmarks but did you look at the benchmarks? Look at the question and look what is required to pass it. Then take a moment and think. For the love of God, take a moment and think about how you can pass those tests. Don't just take a pass at face value and move on. If you do, well I got a bridge to sell you.

[0] https://en.wikipedia.org/wiki/Correctness_(computer_science)

>>wiseow+KZ
https://en.wikipedia.org/wiki/Beowulf_(2007_film)

>>dcow+LZ
Do we need skeptics? We might just need to wait for AI (Actually Indians) Companies to run out of money: https://www.dexerto.com/entertainment/ai-company-files-for-b...

>>jszymb+JM
This is exactly what I wrote about when I wrote "Copilot Induced Crash" [0]

Funny story: when I first posted that and had a couple of thousand readers, I had many comments of the type "you should just read the code carefully on review", but _nobody_ pointed out the fact that the opening example (the so called "right code") had the exact same problem as described in the article, proving exactly what you just said: it's hard to spot problems that are caused by plausibility machines.

[0] https://www.bugsink.com/blog/copilot-induced-crash/

>>zwnow+UY
65% of households can own homes.

https://fred.stlouisfed.org/series/RSAHORUSQ156S

(This is somewhat but not entirely tautological.)

>>tablet+(OP)
I think this article is pretty spot on — it articulates something I’ve come to appreciate about LLM-assisted coding over the past few months.

I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.

Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.

It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.

But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.

I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.

What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.

Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.

What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.

[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...

>>wiseow+CZ
There are multiple organizations with mirrors for packages, and I doubt if the cost of a mirror is the same as a cost of 384GB memory server.

A guy says here you need 4TB for a PyPi mirror, 285 GB for npm

https://stackoverflow.com/questions/65995150/is-it-possible-...

>>godels+H11
Sure,

> In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified.

"As specified" here being the key phrase. This is defined however you want, and ranges from a person saying "yep, behaves as specified", to a formal proof. Modern language language models are trained under RL for both sides of this spectrum, from "Hey man looks good", to formal theorem proving. See https://arxiv.org/html/2502.08908v1.

So I'll return to my original point: LLMs are not just generating outputs that look plausible, they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements. They are explicitly trained to do this.

So while you argue over the semantics of "correctness", the rest of us will be building stuff with LLMs that is actually useful and fun.

>>albert+mc
you have image with WER on openai blog post here: https://openai.com/index/introducing-our-next-generation-aud...

On their chart they compare also with: gemini 2.0 flash, whisper large v2, whisper large v3, scribe v1, nova 1, nova 2. If you need only english transcription then pretty much all models will be good these days but big difference is depending on input language.

>>0x000x+dw
I told ChatGPT to remember the following rules, as a tribute to Trurl's Machine. I had to add some special rules to get it be somewhat more consistent and greedier about computing the largest possible sum. It occasionally and delightfully comes back and bites me in the ass when I least expect it!

Saved memories:

Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:

1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.

2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.

3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).

4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.

5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.

6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.

These rules should be applied consistently in every session.

https://theoxfordculturereview.com/2017/02/10/found-in-trans...

>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.

I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:

>>21644959

>Trurl's Machine, by Stanislaw Lem

>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.

>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]

A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:

>>38744779

>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?

>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!

>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:

>"I think, therefore I sum" <=> "Cogito, ergo sum"

[...]

More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".

ChatGPT admits:

>Why “perverted”?

>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.

>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.

[Dammit, now it's ignoring my strict rule about no em-dashes!]

>>tptace+581
> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.

I think the case we are making is there is no such thing as intellectual property to begin with and the whole thing is a scam created by duck taping a bunch of different concepts together when they should not be grouped together at all.

https://www.gnu.org/philosophy/not-ipr.en.html

>>tablet+(OP)
well, I'm going to reference [1] a comment on the "Who Cares Era" that really stood out and maybe should get pinned somewhere.

What about that "superpower" we had to make something out of _nothing_ but our own brains? Sure, you may have looked up some code (do some research and learn), but that mental exercise.. just knowing your code inside-out, can never be compared to the "oh look at that, it actually works" with LLM stuff.

Well guess what, now, you traded that superpower for a subscription! (yes, you are now a middle man of software peddling - paying some company that stole other peoples code and ideas and did not pay _them anything_ in return), and eventually you will get to a point when the internet is down, you won't write any software that day cause you cant get the N model to work.

Did we go to the next step in software evolution? If it has a cost per month (and you pay thieves) (yes that what they are), its going to be a hard "no" for me, you just became a cow. Sure, now this is whats expected of you? OK. This has some strong neoliberalism drift, I assume I can be wrong so please comments welcome I am quite interested in this part.

yes, augmentations seem like amputations, is it McLuhan? I'm not sure, Will we retain our power to produce software? No! Humans dumb down very quickly, a generation is all it took to reduce reading to kids below expected educational levels [2]. There will be less professionals and lots of peddlers that have no skills, and software will be so junk it will be seriously scary to catch a plane in 10 years time.

Interesting times.

[1] >>44116724 [2] https://www.theguardian.com/lifeandstyle/2025/jun/02/gen-z-p...

>>nostre+wF
> ...code generated ar FAANG...

Oh, like these set of PRs on the dotnet repo? >>44050152

>>fjfaas+U91
This is very outside my domain :P I asked ChatGPT to explain the acronyms in your comment and still do not understand it. But I think a reason LLMs might struggle is that there are too many up front complex requirements.

Giving it only 1 or 2 complex requirements at a time, and then having it iterate, I've found to be more effective.

Most LLMs don't "think", so when asking an LLM something, I generally try to think "would I be able to do this without thinking, if I had all the knowledge, but just had to start typing and go?".

You could break down your prompt into separate prompts like this maybe: https://chatgpt.com/share/683eb7d7-e7ec-8012-8b3b-e34d523dc9...

I think it broke things down in a weird way, but I definitely can't analyse the correctness of anything it outputs in this domain :P

Coding specific agents like Copilot might be better able to handle a complex initial prompt, since they take the initial prompt, and use LLMs to break it down into smaller steps, which ChatGPT doesn't do. They can sort of "think". Deep research AIs also have a sort of thinking too, so might do better.

>>neom+KV
Yes, it's been an important part of tricking humans into sharing their knowledge with other humans to obtain a huge Q&A dataset to train the AI without any consent of said people.

https://meta.stackexchange.com/questions/399619/our-partners...

>>tablet+(OP)
I'm on the side of AI is a huge productivity booster (by my guess... 10x)

But I don't want to make the claim lightly, so I did an experiment.

I signed up for copilot pro, and have been using their 'edit' feature. This is more than just their auto complete.

I set myself a goal to create a playable web game of classic frogger.

It took 4 hours with copilot "edit" and my full attention.

I didn't write a single line of code, but I did ask it to refactor and gave it a project description.

I suspect this would have taken me 4 days full time to get to this level.

Try it out:

https://dev.zebrar.com/jd/frogger/

>>pydry+Aa1
It's okay, lots of people's code is always buggy. I know people that suck at coding and have been doing it for 50 years. It's not uncommon

I'm not saying don't make tests. But I am saying you're not omniscient. Until you are, your tests are going to be incomplete. They are helpful guides, but they should not drive development. If you really think you can test for every bug then I suggest you apply to be Secretary for health.

https://hackernoon.com/test-driven-development-is-fundamenta...

https://geometrian.com/projects/blog/test_driven_development...

>>drited+Ib1

    C++, Linux: write an audio processing loop for ALSA    
    reading audio input, processing it, and then outputting
    audio on ALSA devices. Include code to open and close
    the ALSA devices. Wrap the code up in a class. Use 
    Camelcase naming for C++ methods.
    Skip the explanations.

``` Run it through grok:

    https://grok.com/

When I ACTUALLY wrote that code the first time, it took me about two weeks to get it right. (horrifying documentation set, with inadequate sample code).

Typically, I'll edit code like this from top to bottom in order to get it to conform to my preferred coding idioms. And I will, of course, submit the code to the same sort of review that I would give my own first-cut code. And the way initialization parameters are passed in needs work. (A follow-on prompt would probably fix that). This is not a fire and forget sort of activity. Hard to say whether that code is right or not; but even if it's not, it would have saved me at least 12 days of effort.

Why did I choose that prompt? Because I have learned through use that AIs do will well with these sorts of coding tasks. I'm still learning, and making new discoveries every day. Today's discovery: it is SO easy to implement SQLLite database in C++ using an AI when you go at it the right way!

>>tablet+(OP)
The author defends mediocore code, yet wrote this piece:

https://fly.io/blog/vscode-ssh-wtf/

Where he dunks on how SSH access works in VSCode. I don't know. The code and architecture behind this feature may well be bananas, but gets the work done. Sounds like a clear case of mediocority. I wonder how does he reconcile those two articles together.

For me this is more of a clickbait. Both of the articles. With that in mind, if I am nuts for being sceptical of LLMs, I think it is fair to call the author a clickbaiter.

>>Fredki+Tr1
Fair enough. In that case small models like devstral [1] are probably your best bet

[1] https://mistral.ai/news/devstral

>>cdrini+Uf1
I did follow your suggestion to break it down in separate prompts, but that to did not result in any useful code. I did not try to compile it, but I get the impression that it does not even compile. There is nothing remotely related to the VirtIO protocol in the code. See Section 5.18 of https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-...

When I added the prompt: 'Could you use the specification given in Section 5.18 of https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-...' it produced almost the same code preceded with some babbling from the document, but not using anything from the specification, not even the code fragments mentioned in the section.

>>Dontch+lt1
I swear this is not me..

"Claude gives up and hardcodes the answer as a solution" - https://www.reddit.com/r/ClaudeAI/comments/1j7tiw1/claude_gi...

>>simonw+Fy
ditto - https://ghuntley.com/disclosures

>>slg+qP
The big distinction is that the cheaper AIs will crowd out the humans out of the market, so mass market commercial art will be made by AIs if it is possible to produce that art. But some people will still want non-AI art, which I believe will be focused on less commercial focused art sectors.

> Music, for example, is an incredibly commercialized art. Replacing every song or album I have ever purchased with AI generated facsimiles is also an incredibly depressing thought.

And just to be clear, I'm not saying you're wrong.

> I would hope people still find value in painting, especially in a world with photography.

Sure, people do, but it is now a hobby for some and high art for a smaller number of professional painters, but the market willing to sustain a large number of professional painters doing portraits is gone.

> That is even ignoring the strained nature of this analogy. The context of the original quote was in a discussion of the inherent plagiarism of AI. Photography wasn't invited by stealing painters work.

I think the analogy is relevant because I am discussing the plagiarism of AI in relation to the economic aspects of copyright infringement and the impacts on the market for artists and SW devs. Not in relation to the moral rights[1] of authors. The issue of artists being annoyed on principle, not on economic effects, that some souless computer is producing plagiarist art that imitates their artstyle without attribution is a separate but closely related issue. I'm not sure but I think the article is more concerned with the former issue.

[1] https://en.wikipedia.org/wiki/Moral_rights

>>pera+2u1
"Why haven't we seen an explosion of new start-ups, products or features?"

You're posting this question on a forum hosted by YC. Here's a story from March 2024: "YC’s latest W24 batch includes 240 companies. A significant portion of the companies have some AI component, with 63% tagged as “Artificial Intelligence” — a notable increase from 51% in the preceding S23 batch and 29% before that.". https://jamesin.substack.com/p/analysis-of-ycs-latest-w24-ba...

I've not seen the same analysis for more recent batches.

>>throwa+n31
I mean, context changing how we see art seems natural enough. Would anyone care about analyzing this painting[1] if wasn't created by Hitler? Would anyone care about this child's drawings[2] if it wasn't 800 years old? Would people care about a stained glass depiction of the Crucifixion of Jesus if it wasn't a central event in Christianity?

Personally, I think in a lot of cases, people want to feel some sort of emotional connection with the artist through their creation, which doesn't work if you know it's AI created.

[1] https://commons.wikimedia.org/wiki/File:Adolf_Hitler_Der_Alt...

[2] https://en.wikipedia.org/wiki/Onfim

>>pera+2u1
i don't know - I agree we haven't seen changes to our built environment, but as for an "explosion of new start-ups, products" we sort of are seeing that?

I see new AI assisted products everyday, and a lot of them have real usage. Beyond the code-assistants/gen companies which are very real examples, here's an anecdote.

I was thinking of writing a new story, and found http://sudowrite.com/ via an ad, an ai assistant for helping you write, its already used by a ton of journalists and serious writers, and am trying it out.

Then i wanted to plan a trip - tried google but saw nothing useful, and then asked chatgpt and now have a clear plan

>>matthe+y41
One of the things I think is going on here is a sort of stone soup effect. [1]

Core to Ptacek's point is that everything has changed in the last 6 months. As you and I presume he agree, the use of off-the-shelf LLMs in code was kinda garbage. And I expect the skepticism he's knocking here ("stochastic parrots") was in fact accurate then.

But it did get a lot of people (and money) to rush in and start trying to make something useful. Like the stone soup story, a lot of other technology has been added to the pot, and now we're moving in the direction of something solid, a proper meal. But given the excitement and investment, it'll be at least a few years before things stabilize. Only at that point can we be sure about how much the stone really added to the soup.

Another counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say.

So I'm still skeptical of the hype. After all, the hype is basically the same as 6 months ago, even though now the boosters can admit the products of 6 months ago sucked. But I can believe we're in the middle of a revolution of developer tooling. Even so, I'm content to wait. We don't know the long term effects on a code base. We don't know what these tools will look like in 6 months. I'm happy to check in again then, where I fully expect to be again told: "If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing." At least until then, I'm renewing my membership in the Boring Technology Club: https://boringtechnology.club/

[1] https://en.wikipedia.org/wiki/Stone_Soup

>>capiki+vP1
https://the-decoder.com/openai-quietly-funded-independent-ma...

You mean these?

I use AI everyday but you’ve got hundreds of billions of dollars and Scam Altman (known for having no morals and playing dirty) et al on “your” side. The only thing AI skeptics have is anecdotes and time. Having a principled argument isn’t really possible.

>>dematz+KR1
Well, the old Java model, where you have dozens of small files, for even the simplest applications, may be better for humans, but it's difficult to feed that to an LLM prompt. With the way I work, I can literally copy and paste. My files aren't so big, that they choke the server, but they are big enough to encompass the whole domain. I use SwiftLint to keep my files from getting too massive, but I also like to keep things that are logically connected, together.

Judge for yourself.

Here's the file I am working on: [0].

The issue was in this initializer: [1]. In particular, this line was missing: [2]. I had switched to using a UIButton as a custom view, so the callback only got the button, instead of the container UIBarButtonItem. I needed to propagate the tag into the button.

[0] https://github.com/LittleGreenViper/SwipeTabController/blob/...

[1] https://github.com/LittleGreenViper/SwipeTabController/blob/...

[2] https://github.com/LittleGreenViper/SwipeTabController/blob/...

>>Dylan1+4j1
Have you read the hilarious PRs that copilot put out last week ? it is here for your reference [1]. The humor is in the giant gap between what it can do, and what the hype says it can do.

Can you show me 1 PR put out by any agent in any open-source repo with wide usage ?

[1] https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...

>>fc417f+kR
Very interesting. It grabbed that from the marketing at ahttps://www.pronamel.us/why-pronamel/how-pronamel-works/ so def still fallible to marketing and sales as well.

>>ta_202+kX1
Slop adding implicit arbitrary higher order tensors to linear problems.

Sounds cool, but not useful... lol

Reminds one of a classic fictional conversation... =3

https://www.youtube.com/watch?v=X6WHBO_Qc-Q

>>Fillig+Rr1
but then, if one needs to change communications style from human to AI, does this ethos then get tossed to the wind?

https://lkml.org/lkml/2012/12/23/75

>>abe94+JN1
> I was thinking of writing a new story, and found http://sudowrite.com/ via an ad, an ai assistant for helping you write, its already used by a ton of journalists and serious writers, and am trying it out.

I am not seeing anything indicating it is actually used by a ton of journalists and serious writers. I highly doubt it is, the FAQ is also paper thin in as far as substance goes. I highly doubt they are training/hosting their own models yet I see only vague third party references in their privacy policy. Their pricing is less than transparent given that they don't really explain how their "credits" translate to actual usage. They blatantly advertise this to be for students, which is problematic in itself.

This ignores all the other issues around so heavily depending on LLMs for your writing. This is an interesting quirk for starters: https://www.theguardian.com/technology/2024/apr/16/techscape... . But there are many more issues about relying so heavily on LLM tools for writing.

So this example, to me, is actually exemplifying the issue of overselling capabilities while handwaving away any potential issues that is so prevalent in the AI space.

>>tptace+f61
It's kind of ironic to me that this is so often the example trotted out. Look at the BLS data sheet for job outlook: https://www.bls.gov/ooh/sales/travel-agents.htm#tab-6

> Employment of travel agents is projected to grow 3 percent from 2023 to 2033, about as fast as the average for all occupations.

The last year there is data for claims 68,800 people employed as travel agents in the US. It's not a boom industry by any means, but it doesn't appear they experienced the apocalypse that Hacker News believes they did, either.

I don't know how to easily find historical data, unfortunately. BLS publishes the excel sheets, but pulling out the specific category would have to be done manually as far as I can tell. There's this, I guess: https://www.travelagewest.com/Industry-Insight/Business-Feat...

It appears at least that what happened is, though it may be easier than ever to plan your own travel, there are so many more people traveling these days than in the past that the demand for travel agents hasn't crashed.

>>gdubs+Z
It invalidates this CinemaSins nitpick on Alien completely

https://youtu.be/dJtYDb7YaJ4?si=5NuoXaW0pkGoBSJu&t=76

>>noname+k22
https://www.vice.com/en/article/why-are-travel-agents-still-...

Has some stats. It seems pretty clear the interests of travel agents did not count for much in the face of technological change.

>>wpietr+KO1
> Core to Ptacek's point is that everything has changed in the last 6 months.

This was actually the only point in the essay with which I disagree, and it weakens the overall argument. Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you.

I wrote this comment elsewhere: >>44164846 -- Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions, but if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.

>>jatins+QD
This is in line with "Programming As Theory Building"[0] (Peter Naur, 1985) that puts forward the idea that when one is building a system one builds a "theory" (in a sense a mapping of what, where and why) of that system that is essential for continued work on it. Once I internalized this idea it helped me make sense of why rewrites are not "never a good idea" and simultaneously why they are so fraught with peril, as well as what kind of onboarding actually makes sense for work on a system. It's also why I can never get on board with the passive "review code an LLM spits out" type of development; it just does not build theory as reliably, effectively and efficiently.

It's a good, very digestible paper, in my opinion; well worth the read.

[0] - https://gwern.net/doc/cs/algorithm/1985-naur.pdf

>>Workac+cS
I don't have to post my chat, someone else already posted a chat claiming ChatGPT gave them correct answers when the answers ChatGPT gave them were all kinds of wrong.

See: >>44164633 and my analysis of the results: >>44171575

You can send me all your money via paypal, money order or check.

>>creesc+f22
Hey co-founder of Sudowrite here. We indeed have thousands of writers paying for and using the platform. However, we aim to serve professional novelists, not journalists or students. We have some of both using it, but it's heavily designed and priced for novelists making a living off their work.

We released our own fiction-specific model earlier this year - you can read more it at https://www.sudowrite.com/muse

A much-improved version 1.5 came out today -- it's preferred 2-to-1 vs Claude in blind tests with our users.

You're right on the faq -- alas, we've been very product-focused and haven't done the best job keeping the marketing site up to date. What questions do you wish we'd answer there?

>>rerdav+Rg2
Results with gemini

https://pastebin.com/6b4yhfYw

A 5 second search on DDG ("easyeffects") and a 10 second navigation on github.

https://github.com/wwmm/easyeffects/blob/master/src/plugin_b...

But that is GPL 3.0 and a lot of people want to use the license laundering LLM machine.

N.B. I already know about easyeffects from when I was seeking for a software equalizer

EDIT

Another 30 seconds exploration ("pipewire" on DDG, finding the main site, then goes on the documentation page, and the tutorial section).

https://docs.pipewire.org/audio-dsp-filter_8c-example.html

There's a lot of way to find truthful information without playing Russian roulette with an LLM.

>>malfis+Oi2
I'm not gonna go all out, this thread is gonna be dead soon but here all the toothpastes ChatGPT was referring to

[1]https://dentalhealth.com/products/fluoridex-sensitivity-reli...

[2]https://www.fireflysupply.com/products/hello-naturally-white...

[3]https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?set...

(Seems toms recently discontinued this, they mention it on their website, but say customers didn't like it)

[4]https://www.jason-personalcare.com/product/sea-fresh-anti-ca...

As far as I can tell these are all real products and all meet the requirement of having fluoride and being SLS free.

Since you did return however and that was half my bet, I suppose you are still entitled to half my life savings. But the amount is small so maybe the knowledge of these new toothpastes is more valuable to you anyway.

>>Gensho+Ea1
Could you please stop breaking the site guidelines, as you did here and in >>44167429 ? If you keep posting like that we're going to have to ban you again. I don't want to do that, but we can't have commenters abusing HN like this.

If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it. Note these, for example:

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."

"Please don't fulminate."

https://news.ycombinator.com/newsguidelines.html

>>bwfan1+1Y1
An instance of dogfooding, but supposedly the last release of Aider [1] had the agent write 79% of its code.

[1]: https://github.com/Aider-AI/aider/blob/main/HISTORY.md#aider...

>>davidg+Wu2
Their client application (written in Go) is open source: https://github.com/superfly/flyctl

They have a few other notable open source components, I think these two are particularly interesting:

- https://github.com/superfly/corrosion

- https://github.com/superfly/litefs

(Disclosure: they sponsor some of my work.)

>>skydha+ge2
I would like to understand how you ideally imagine a person solving issues of this type. I'm for understanding things instead of hacking at them in general, and this tendency increases the more central the things to understand are to the things you like to do. However, it's a point of common agreement that just in the domain of computer-related tech, there is far more to learn than a person can possibly know in a lifetime, and so we all have to make choices about which ones we want to dive into.

I do not expect to go through the process I just described for more than a few hours a year, so I don't think the net loss to my time is huge. I think that the most relevant counterfactual scenario is that I don't learn anything about how these things work at all, and I cope with my problem being unfixed. I don't think this is unusual behavior, to the degree that it's I think a common point of humor among Linux users: https://xkcd.com/963/ https://xkcd.com/456/

This is not to mention issues that are structurally similar (in the sense that search is expensive but verification is cheap, and the issue is generally esoteric so there are reduced returns to learning) but don't necessarily have anything to do with the Linux kernel: https://github.com/electron/electron/issues/42611

I wonder if you're arguing against a strawman that thinks that it's not necessary to learn anything about the basic design/concepts of operating systems at all. I think knowledge of it is fractally deep and you could run into esoterica you don't care about at any level, and as others in the thread have noted, at the very least when you are in the weeds with a problem the LLM can often (not always) be better documentation than the documentation. (Also, I actually think that some engineers do on a practical level need to know extremely little about these things and more power to them, the abstraction is working for them.)

Holding what you learn constant, it's nice to have control about in what order things force you to learn them. Yak-shaving is a phenomenon common enough that we have a term for it, and I don't know that it's virtuous to know how to shave a yak in-depth (or to the extent that it is, some days you are just trying to do something else).

>>habosa+VM
Every new major technology always endangeres the status quo.

https://chatgpt.com/share/683f3932-fce0-8012-a108-4b70c3e5fd...

Things change and it's scary, but it usually works out. Or at least we just get used to it.

>>noname+k22
https://fred.stlouisfed.org/series/LEU0254497900A

40% of all travel agent jobs lost between 2001 and 2025. Glad I'm not a travel agent.

>>creesc+Tz2
Wow, so many assumptions here that don't make sense to me, but I realize we all have different perspectives on this stuff. Thank you for sharing yours! I really do appreciate it.

I won't go line-by-line here defending the cutesy copy and all that since it's not my job to argue with people on the internet either… but on a few key points that interested me:

- language support: I don't believe we're being disingenuous. Sudowrite works well in many languages. We have authors teaching classes on using Sudowrite in multiple languages. In fact, there's one on German tomorrow and one on French next week: https://lu.ma/sudowrite Our community runs classes nearly every day.

- student usage - We do sometimes offer a student discount when people write in to ask for it, and we've had multiple collage and high school classes use sudowrite in writing classes. We'll often give free accounts to the class when professors reach out. I don't believe AI use in education is unethical. I think AI as copilot is the future of most creative work, and it will seem silly for teachers not to incorporate these tools in the future. Many already are! All that said, we do not market to students as you claim. Not because we think it's immoral -- we do not -- but because we think they have better options. ChatGPT is free, students are cheap. We make a professional tool for professional authors and it is not free nor cheap. It would not make sense for our business to market to students.

- press quotes -- Yes, we quote journalists because they're the ones who've written articles about us. You can google "New Yorker sudowrite" etc and see the articles. Some of those journalists also write fiction -- that one who wrote the New Yorker feature had a book he co-wrote with AI reviewed in The New York Times.

> I then noticed it was last updated in 2020? I highly doubt you guys have been around for that long

So many of these objections feel bizarre to me because they're trivial to fact-check. Here's a New York Times article that mentions us, written in 2020. We were one of the first companies to use LLMs in this wave and sought and gained access to GPT-3 prior to public API availability. https://www.nytimes.com/2020/11/24/science/artificial-intell...

>>onemor+UL2
I have been in this industry since the mid 80s. I can't tell you how many people worry that I can't handle change because as a veteran, I must cling to what was. Meanwhile, of course, the reason I am still in the industry is because of my plasticity. Nothing is as it was for me, and I have changed just about everything about how I work multiple times. But what does stay the same all this time are people and businesses and how we/they behave.

Which brings me to your comment. The comparison to Uber drivers is apt, and to use a fashionable word these days, the threat to people and startups alike is "enshittification." These tools are not sold, they are rented. Should a few behemoths gain effective control of the market, we know from history that we won't see these tools become commodities and nearly free, we'll see the users of these tools (again, both people and businesses) squeezed until their margins are paper-thin.

Back when articles by Joel Spolsky regularly hit the top page of Hacker News, he wrote "Strategy Letter V:" https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

The relevant takeaway was that companies try to commoditize their complements, and for LLM vendors, every startup is a complement. A brick-and-mortar metaphor is that of a retailer in a mall. If you as a retailer are paying more in rent than you're making, you are "working for the landlord," just as if you are making less than 30% of profit on everything you sell or rent through Apple's App Store, you're working for Apple.

I once described that as "Sharecropping in Apple's Orchard," and if I'm hesitant about the direction we're going, it's not anything about clinging to punch cards and ferromagnetic RAM, it's more the worry that it's not just a question of programmers becoming enshittified by their tools, it's also the entire notion of a software business "Sharecropping the LLM vendor's fields."

We spend way too much time talking about programming itself and not enough about whither the software business if its leverage is bound to tools that can only be rented on terms set by vendors.

--------

I don't know for certain where things will go or how we'll get there. I actually like the idea that a solo founder could create a billion-dollar company with no employees in my lifetime. And I have always liked the idea of software being "Wheels for the Mind," and we could be on a path to that, rather than turning humans into "reverse centaurs" that labour for the software rather than the other way around.

Once upon a time, VCs would always ask a startup, "What is your Plan B should you start getting traction and then Microsoft decides to compete with you/commoditize you by giving the same thing away?" That era passed, and Paul Graham celebrated it: https://paulgraham.com/microsoft.html

Then when startups became cheap to launch—thank you increased tech leverage and cheap money and YCombinator industrializing early-stage venture capital—the question became, "What is your moat against three smart kids launching a competitor?"

Now I wonder if the key question will bifurcate:

1. What is your moat against somebody launching competition even more cheaply than smart kids with YCombinator's backing, and;

2. How are you insulated against the cost of load-bearing tooling for everything in your business becoming arbitrarily more expensive?

>>sarche+153
No matter how hard I train my fingers will never be able to move fast enough to output 100 lines of code in 15 seconds.

When I get to writing actual production code with LLMs I treat them more as typing assistants than anything else: https://simonwillison.net/2025/Mar/11/using-llms-for-code/#t...

>>tuhlat+B13

  > I took this to mean "good code within the domain of software engineering," not "good code within the universe of possible programs.

We both mean the same thing. The reasonable one. The only one that even kinda makes sense: good enough code

  > vast majority of production software has metrics/unit tests that could be attached and subsequently hillclimbed via RL

Yes, hill climbed. But that's different than "towards good"

Here's the difference[0]. You'll find another name for Goodhart's Law in any intro ML course. Which is why it is so baffling that 1) this is contentious 2) it is the status quo in research now

Your metrics are only useful if you understand them

Your measures are only as good as your attention

And it is important to distinguish metrics from measures. They are different things. Both are proxies

  > Your examples of tasks that can't easily be optimized (image fidelity, song quality, etc.) seem out of scope to me

Maybe you're unfamiliar with diffusion models?[1]

They are examples where it is hopefully clearer that these things are hard to define. If you have good programming skills you should be able to make the connection back to what this has to do with my point. If not, I'm actually fairly confident GPT will be able to do so. There's more than enough in its training data to do that.

[0] https://en.wikipedia.org/wiki/Goodhart%27s_law

[1] https://stability.ai/

>>tptace+JF2

  > You're here using "ground truth" in some kind of grand epistemic sense

I used the word "ground truth" because you did!

  >> in agent loops with access to ground truth about whether things compile and pass automatic acceptance.

Your critique about "my usage of ground truth" is the same critique I'm giving you about it! You really are doing a good job at making me feel like I'm going nuts...

  > the information an LLM natively operates with,

And do you actually know what this is?

I am a ML researcher you know. And one of those ones that keeps saying "you should learn the math." There's a reason for this, because it is really connected to what you're talking about here. They are opaque, but they sure aren't black boxes.

And it really sounds like you're thinking the "thinking" tokens are remotely representative of the internal processing. You're a daily HN user, I'm pretty sure you saw this one[0].

I'm not saying anything OpenAI hasn't[1]. I just recognize that this applies to more than a very specific narrow case...

[0] >>44074111

[1] https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def563...

>>rerdav+rv2
I'd count ChatGPT memory as a feature of ChatGPT, not of the underlying LLM.

I wrote a bit about that here - I've turned it off: https://simonwillison.net/2025/May/21/chatgpt-new-memory/

>>xpe+mV1
I think "the reality", at least as something involving a new paradigm, has yet to be established. I'll note that I heard plenty of similar talk about how developers just couldn't adapt six months or more ago. Promoters now can admit those tools were in fact pretty bad, because they now have something else to promote, but at the time those not rawdogging LLMs were dinosaurs under a big meteor.

I do of course agree that some people are just refusing to "wrap their minds around the changing world". But anybody with enough experience in tech can count a lot more instances of "the world is about to change" than "the world really changed". The most recent obvious example being cryptocurrencies, but there are plenty of others. [1] So I think there's plenty of room here for legitimate skepticism. And for just waiting until things settle down to see where we ended up.

[1] E.g. https://www.youtube.com/watch?v=b2F-DItXtZs

>>bdangu+Lx
So when LLMs go around in circles, as it often does [1], that's a skill issue. But when it gets it right some of the time, that's proof of superiority.

This is the kind of reasoning that dominates LLM zealotry. No evidence given for extraordinary claims. Just a barrage of dismissals of legitimate problems. Including the article in discussion.

All of this makes me have a hard time taking any of it seriously.

[1]: >>44050152

>>Hammer+ed
At least actual examples are underwhelming.

>>44050152

>>danShu+Ht3
I've kind of lost track of where we disagree here, to be honest.

Maybe we need to drop "easier" and "harder" and talk about speed.

I can write software faster with LLMs, without sacrificing quality - in fact I can get higher quality software because doing things "better" doesn't mean they take longer.

I derive enjoyment from building good stuff. If I can do it faster I can build more of it, which increases my enjoyment.

I wrote about this a couple of years ago: "AI-enhanced development makes me more ambitious with my projects": https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen...

That's still true today, only more so because the models are significantly more competent than they were in March 2023!

>>echelo+ed2
Actually, I agree. It won't be long before businesses handle software engineering like Google does "support." You know, that robotic system that sends out passive-aggressive mocking emails to people who got screwed over by another robot that locks them out of their digital lives for made up reasons [1]. It saves the suits a ton of cash while letting them dodge any responsibility for the inevitable harm it'll cause to society. Mediocrity will be seen as a feature, and the worst part is, the zealots will wave it like a badge of honor.

[1]: >>26061935

>>tptace+f61
Let's be real. Software engineers are skeptical right now not because they believe robots are better than them. Quite the opposite. The suits will replace software engineers despite its mediocrity.

It was just 2 weeks ago when the utter incompetence of these robots were in full public display [1]. But none of that will matter to greedy corporate executives, who will prioritize short-term cost savings. They will hop from company to company, personally reaping the benefits while undermining essential systems that users and society rely on with robot slop. That's part of the reason why the C-suites are overhyping the technology. After all, no rich executive has faced consequences for behaving this way.

It's not just software engineering jobs that will take a hit. Society as a whole will suffer from the greedy recklessness.

[1]: >>44050152

>>jhatem+cI3
Yes, some think it’s happening now: https://www.aisnakeoil.com/p/is-ai-progress-slowing-down

>>gspenc+er2
>First, I work on a very massive codebase, with a large engineering organization.

I sure dont. I'm a cyber security 0day writer or linux bash scripter or yikes coder, who has made some bad turns and ended up with some open source projects in python that absolutely nobody would call large or massive.

>The benefit of unit tests is that, at their best, they give you a safety net for refactoring existing code. If you change the implementation of a system under test, and the tests pass, you know you didn't introduce a breaking change.

Never refactor. This is literally why everyone loves objected orientated and paint themselves in a corner that blows up in their face.

>Now your unit test is not only failing to encourage you to refactor code... it's actively DISCOURAGING you from refactoring code. In this case, the unit test is providing DISVALUE rather than value.

That sounds perfect to me. Never refactor, you end of life or version that old code out, and replace it with the new perfect code. We have WPA #3, TLS 1.3, SSH 2, IKE v2, HTTP 3.

It's fine to replace until you get it right.

>So the fact that a) you think unit tests are a waste of time and b) you look at AI as a way to save you from a chore ... tells me that you have no business ever writing uint tests with or without AI. Please stop. You are making the world worse by leveraging an LLM to do these things for you.

We live in very different worlds. I do like the idea that you feel like im making the world worse by having AI do the chores.

>I have NEVER looked at writing a unit test by hand as a "chore" or as a "waste of time."

Me neither. I just dont do them. with how awesome AI has been... im probably going to have the time to read for the first time: https://docs.djangoproject.com/en/5.2/topics/testing/

>I often write my tests before even writing my implementation code, because doing so helps me think through both the design and requirements of my code... and gives me a little mini-sandbox context where I can make sure that the brand new code that I am writing is doing what I want it to. It's a problem solving tool. Not something to be done after the fact as a chore.

I dont code anything so complicated. I come from linux, but they call it the 'unix philosophy'. Do one thing and do it well. Kind of akin to KISS principle.

Write clean simple code that self-documents itself that legit cant really go wrong. Especially if some jerk comes along like me who throws a fuzzer at it or worse user input.

>"Write Documentation" is not writing code. And if you don't read the documentation you're generating, no one else will. So what's the point of having it in the first place if no one reads it? Food for thought.

Good point.

>"Find problems" I see as being akin to a spell check, or the red squigglies when you have a syntax error.

Community pycharm has all that for free. The "find problems" is entirely next level. I'm so very sold.

I just wrote a stupid def test(): which returns a string. The linter has not a single thing to say.

1. *Try Block Contains No Risk Operations*: - The try-except block is wrapping a simple string operation that won't raise exceptions - Makes the try-except block unnecessary

>I do see the value of having ML tech within the internal IDE implementation. We've been using ML for email SPAM filters for decades. There are interesting problems that can be solved there. But this is an area where I want my IDE to flag something for me. It's not me actively prompting an LLM to generate code for me.

Since im public open source project on github I got copilot for free like 3 years ago and my biggest complaint was that it was generating garbage code for me and forcing it. It got so annoying i disabled it and stopped using it. Im sure it has improved since then.

Pycharm's AI on occasion has a tab completion thats really good, but isnt forcing.

>>keeda+oc2
AI posts (including this one) are all over his employers blog lately, so there’s some stake (fly MCP, https://fly.io/blog/fuckin-robots/, etc).

>>tablet+(OP)
Throwing my two cents in here...I think there's a disconnect between what AI advocates want, and what everyone else wants.

The arguments against genAI tend to point out things like: 1. Its output is unreliable at best 2. That output often looks correct to an untrained eye and requires expert intervention to catch serious mistakes 3. The process automates away a task that many people rely on for income

And the response from genAI advocates tends to be dismissive...and I suspect it is, in part, because that last point is a positive for many advocates of genAI. Nobody wants to say it out loud, but when someone on Reddit or similar claims that even a 10% success rate outweighs the 90% failure rate, what they mean is most likely "A machine that works 10% of the time is better than a programmer who works 60-80% of the time because the machine is more than 6-to-8-times cheaper than the programmer".

There's also the classic line about how automation tends to create more jobs in the future than it destroys now, which itself is a source of big disconnects between pro-genAI and anti-genAI crowds--because it ignores a glaring issue: Just because there's gonna be more jobs in the future, doesn't mean I can pay rent with no job tomorrow!

"You can write an effective coding agent in a week" doesn't reassure people because it doesn't address their concerns. You can't persuade someone that genAI isn't a problem by arguing that you can easily deploy it, because part of the concern is that you can easily deploy it. Also, "you’re not doing what the AI boosters are doing" is flat-out incorrect, at least if you're looking at the same AI boosters I am--most of the people I've seen who claim to be using generated code say they're doing it with Claude, which--to my knowledge--is just an LLM, albeit a particularly advanced one. I won't pretend this is anything but anecdata, but I do engage with people who aren't in the "genAI is evil" camp, and...they use Claude for their programming assistance.

"LLMs can write a large fraction of all the tedious code you’ll ever need to write" further reinforces this disconnect. This is exactly why people think this tech is a problem.

The entire section on "But you have no idea what the code is!" falls apart the moment you consider real-world cases, such as [CVE-2025-4143](https://nvd.nist.gov/vuln/detail/cve-2025-4143), where a programmer who is a self-described expert working with Claude--who emphasizes that he checked over the results with a fine-toothed comb, and that he did this to validate his own skepticism about genAI!--missed a fundamental mistake in implementing OAuth that has been common knowledge for a long while. The author is correct in that reading other people's code is part of the job...but this is difficult enough when the thing that wrote the code can be asked about its methods, and despite advances in giving LLMs a sort of train of thought, the fact remains that LLMs are designed to output things that "look truth-y", not things that are logically consistent. (Ah, but we're not talking about LLMs, even though kentonv tells us that he just used an LLM. We're talking about agentic systems. No true AI booster would "just" use an LLM...)

I actually agree with the point about how the language can catch and point out some of the errors caused by hallucination, but...I can generate bad function signatures just fine on my own, thank you! :P In all seriousness, this addresses basically nothing about the actual point. The problem with hallucination in a setting like this isn't "the AI comes up with a function that doesn't exist", that's what I'm doing when I write code. The problem with hallucination is that sometimes that function which doesn't exist is my RSA implementation, and the AI 'helpfully' writes an RSA implementation for me, a thing that you should never fucking do because cryptography is an incredibly complex thing that's easy to fuck up and hard to audit, and you really ought to just use a library...a thing you [also shouldn't leave up to your AI.](https://www.theregister.com/2025/04/12/ai_code_suggestions_s...) You can't fix that with a language feature, aside from having a really good cryptography library built into the language itself, and as much as I'd love to have a library for literally everything I might want to do in a language...that's not really feasible.

"Does an intern cost $20/month? Because that’s what Cursor.ai costs," says the blog author, as if that's supposed to reassure me. I'm an intern. My primary job responsibility is getting better at programming so I can help with the more advanced things my employer is working on (for the record, these thoughts are my own and not those of my employer). It does not make me happy to know that Cursor.ai can replace me. This also doesn't address the problem that, frankly, large corporations aren't going to replace junior developers with these tools; they're going to replace senior developers, because senior developers cost more. Does a senior engineer cost 20 dollars a month? Because that's what Cursor.ai costs!

...and the claim that open source is just as responsible for taking jobs is baffling. "We used to pay good money for databases" is not an epic own, it is a whole other fucking problem. The people working on FOSS software are in fact very frustrated with the way large corporations use their tools without donating so much as a single red cent! This is a serious problem! You know that XKCD about the whole internet being held up by a project maintained by a single person in his free time? That's what you're complaining about! And that guy would love to be paid to write code that someone can actually fucking audit, but nobody will pay him for it, and instead of recognizing that the guy ought to be supported, you argue that this is proof that nobody else deserves to be supported. I'm trying to steelman this blogpost, I really am, but dude, you fundamentally have this point backwards.

I hope this helps others understand why this blogpost doesn't actually address any of my concerns, or the concerns of other people I know. That's kind of the best I can hope for here.

>>tuhlat+8h3

  > Now I'm confused ... that even mathematical proofs could be "terrible"? That doesn't make sense to me.

You know there's plenty of ways to prove things, right? Like there's not a single proof. Here's a few proofs for pi being irrational[0]. The list is not comprehensive.

Take that like you do with code. They all generate the same final output. They're all correct. But is one better than another? Yes, yes it is. But which one that is depends on context.

  > and crucially -- measurable

This is probably a point of contention. Measuring is far more difficult than people think. A lot of work goes into creating measurements and we get a nice ruler at the end. The problem isn't just that initial complexity, it is that every measure is a proxy. Even your meter stick doesn't measure a meter. What distinguishes the engineer from the hobbyist is the knowledge of alignment.

  How well does my measure align with what I intend to measure?

That's a very hard problem. How often do you ask yourself that? I'm betting not enough. Frankly, most things aren't measurable.

[0] https://proofwiki.org/wiki/Pi_is_Irrational#:~:text=Hence%20...

>>tptace+Xn3
I think it depends a lot on what you think "wrong position" means. I think skepticism only really goes wrong when it refuses to see the truth in what it's questioning long past the point where it's reasonable. I don't think we're there yet. For example, questions like "What is the long term effect on a code base" take us seeing the long term. Or there are legitimate questions about the ROI of learning and re-learn rapidly changing tools. What's worth it to you may not be in other situations.

I also think hype cycles and actual progress can have a variety of relationships. After Bubble 1.0 burst, there were years of exciting progress without a lot of hype. Maybe we'll get something similar here, as reasonable observers are already seeing the hype cycle falter. E.g.: https://www.economist.com/business/2025/05/21/welcome-to-the...

And of course, it all hinges on you being right. Which I get you are convinced of, but if you want to be thorough, you have to look at the other side of it.

>>Hammer+ed
There are lots of concrete examples in this article: https://www.semafor.com/article/06/03/2025/how-vibe-coding-i...

Do I think those rise to "case studies"? No. But to another commenters point, detailed and rigorous case studies have always been hard to come by for any productivity process or technology.

I also think that article is hype, but it's not true that it's vague.

>>Verdex+5x
The Primeagen did this. He locked himself in for 7 weeks and tried to build a game by only using cursor. The videos are more entertainment style but this is the closest thing i know of

https://www.youtube.com/watch?v=6x3weFV9YI4

>>tptace+P13
> I don't accept the premise that "training on" and "copying" are the same thing...

Nor do I. Training and copying are clearly different things... and if these tools had never emitted -verbatim- nontrivial chunks of the code they'd ingested, [0] I'd be much less concerned about them. But as it stands now, some-to-many of the companies that build and deploy these machines clearly didn't care to ensure that their machines simply wouldn't plagiarize.

I've a bit more commentary that's related to whether or not what these companies are doing should be permitted here. [1]

[0] Based on what I've seen, when it happens, it is often with either incorrect copyright and/or license notifications, or none of the verbiage the license of the copied code requires in non-trivial reproductions of that code.

[1] <>>44166983 >

>>rfrey+KE1
Because it's rules for us and not for them. If I take Microsoft's code and "transform" it I get sued. If Microsoft takes everyone else's code and "transforms" it (and sells it back to us) well, that's just business, pal. Thomas's argument is completely missing this point.

EDIT to add, I said this more completely a while ago: >>34381996

zlacker

My AI skeptic friends are all nuts