zlacker

[parent] [thread] 62 comments
1. kace91+(OP)[view] [source] 2026-02-04 21:59:04
>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

I can’t imagine any other example where people voluntarily move for a black box approach.

Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?

replies(16): >>manmal+l1 >>Charle+y2 >>csalle+T2 >>notepa+83 >>Aeolun+s3 >>Xirdus+N3 >>ForHac+O4 >>weikju+d5 >>rainco+E7 >>seanmc+L7 >>hjoutf+h8 >>strayd+49 >>AlexCo+Gd >>eikenb+ze >>bloomc+0o >>andyfe+oz
2. manmal+l1[view] [source] 2026-02-04 22:05:43
>>kace91+(OP)
> I can’t imagine any other example where people voluntarily move for a black box approach.

Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.

replies(3): >>re-thc+13 >>kace91+f4 >>ink_13+F5
3. Charle+y2[view] [source] 2026-02-04 22:11:26
>>kace91+(OP)
AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.
4. csalle+T2[view] [source] 2026-02-04 22:13:33
>>kace91+(OP)
> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

The output of code isn't just the code itself, it's the product. The code is a means to an end.

So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.

replies(5): >>alanbe+44 >>kace91+85 >>strayd+Ga >>add-su+ic >>6510+dQ
◧◩
5. re-thc+13[view] [source] [discussion] 2026-02-04 22:13:52
>>manmal+l1
> Anyone overseeing work from multiple people has to?

That's not a black box though. Someone is still reading the code.

> At some point you have to let go and trust people‘s judgement

Where's the people in this case?

> People who do that (I‘m not one of them btw) must rely on higher level reports.

Does such a thing exist here? Just "done".

replies(1): >>manmal+J8
6. notepa+83[view] [source] 2026-02-04 22:14:09
>>kace91+(OP)
people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.
7. Aeolun+s3[view] [source] 2026-02-04 22:15:51
>>kace91+(OP)
> What is the logic here?

It is right often enough that your time is better spent testing the functionality than the code.

Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).

replies(1): >>kace91+f6
8. Xirdus+N3[view] [source] 2026-02-04 22:18:02
>>kace91+(OP)
> I can’t imagine any other example where people voluntarily move for a black box approach.

I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.

replies(1): >>ink_13+i5
◧◩
9. alanbe+44[view] [source] [discussion] 2026-02-04 22:18:51
>>csalle+T2
Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.
◧◩
10. kace91+f4[view] [source] [discussion] 2026-02-04 22:20:01
>>manmal+l1
>At some point you have to let go and trust people‘s judgement.

Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.

>Reading and understanding the whole output of 9 concurrently running agents is impossible.

I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?

I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.

replies(1): >>wtetzn+aC
11. ForHac+O4[view] [source] 2026-02-04 22:22:59
>>kace91+(OP)
>Imagine taking a picture on autoshot mode

Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.

There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?

replies(2): >>weikju+C5 >>sigseg+xn1
◧◩
12. kace91+85[view] [source] [discussion] 2026-02-04 22:24:36
>>csalle+T2
>The output of code isn't just the code itself, it's the product. The code is a means to an end.

I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?

If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code

replies(1): >>Curiou+T6
13. weikju+d5[view] [source] 2026-02-04 22:24:51
>>kace91+(OP)
Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!
replies(1): >>thefz+D6
◧◩
14. ink_13+i5[view] [source] [discussion] 2026-02-04 22:25:10
>>Xirdus+N3
So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.
replies(1): >>sroeri+Fr
◧◩
15. weikju+C5[view] [source] [discussion] 2026-02-04 22:26:17
>>ForHac+O4
You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.
◧◩
16. ink_13+F5[view] [source] [discussion] 2026-02-04 22:26:23
>>manmal+l1
An AI agent cannot be held accountable
replies(1): >>manmal+X8
◧◩
17. kace91+f6[view] [source] [discussion] 2026-02-04 22:30:32
>>Aeolun+s3
I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.
◧◩
18. thefz+D6[view] [source] [discussion] 2026-02-04 22:32:43
>>weikju+d5
You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.
replies(3): >>gibson+qa >>djeast+sw >>bandra+GB
◧◩◪
19. Curiou+T6[view] [source] [discussion] 2026-02-04 22:33:39
>>kace91+85
I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.
replies(2): >>kace91+Y8 >>strayd+Pa
20. rainco+E7[view] [source] 2026-02-04 22:37:11
>>kace91+(OP)
> Because if you can read code, I can’t imagine poking the result with black box testing being faster.

I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.

replies(1): >>nubg+fI
21. seanmc+L7[view] [source] 2026-02-04 22:38:03
>>kace91+(OP)
> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

The AI also writes the black box tests, what am I missing here?

replies(1): >>kace91+od
22. hjoutf+h8[view] [source] 2026-02-04 22:41:41
>>kace91+(OP)
your metaphor is wrong.

code is not the output. functionality is the output, and you do look at that.

replies(1): >>kace91+Qd
◧◩◪
23. manmal+J8[view] [source] [discussion] 2026-02-04 22:44:20
>>re-thc+13
> Someone is still reading the code.

But you are not. That’s the point?

> Where's the people in this case?

Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.

> Does such a thing exist here? Just "done".

Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.

◧◩◪
24. manmal+X8[view] [source] [discussion] 2026-02-04 22:45:31
>>ink_13+F5
Neither can employees, in many countries.
◧◩◪◨
25. kace91+Y8[view] [source] [discussion] 2026-02-04 22:45:36
>>Curiou+T6
>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions

So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.

Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.

And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?

replies(1): >>Curiou+1s
26. strayd+49[view] [source] 2026-02-04 22:46:05
>>kace91+(OP)
No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.

To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.

My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.

◧◩◪
27. gibson+qa[view] [source] [discussion] 2026-02-04 22:54:10
>>thefz+D6
My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.
◧◩
28. strayd+Ga[view] [source] [discussion] 2026-02-04 22:55:21
>>csalle+T2
Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?

I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.

replies(1): >>nubg+3I
◧◩◪◨
29. strayd+Pa[view] [source] [discussion] 2026-02-04 22:56:43
>>Curiou+T6
"Testing ladders" is a great framing.

My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.

◧◩
30. add-su+ic[view] [source] [discussion] 2026-02-04 23:04:29
>>csalle+T2
A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.

Specious analogies don't help anything.

◧◩
31. kace91+od[view] [source] [discussion] 2026-02-04 23:10:26
>>seanmc+L7
>The AI also writes the black box tests, what am I missing here?

If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.

In other words, if “the ai is checking as well” no one is.

replies(1): >>seanmc+ag
32. AlexCo+Gd[view] [source] 2026-02-04 23:12:19
>>kace91+(OP)
> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

It's producing seemingly working code faster than you can closely review it.

replies(1): >>kace91+Qe
◧◩
33. kace91+Qd[view] [source] [discussion] 2026-02-04 23:13:27
>>hjoutf+h8
Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.

Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.

34. eikenb+ze[view] [source] 2026-02-04 23:17:51
>>kace91+(OP)
I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).
replies(3): >>strayd+Ag >>re-thc+gh >>jkhdig+di
◧◩
35. kace91+Qe[view] [source] [discussion] 2026-02-04 23:19:13
>>AlexCo+Gd
Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?
◧◩◪
36. seanmc+ag[view] [source] [discussion] 2026-02-04 23:26:30
>>kace91+od
That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.

etc...etc...

> In other words, if “the ai is checking as well” no one is.

"I tried nothing, and nothing at all worked!"

◧◩
37. strayd+Ag[view] [source] [discussion] 2026-02-04 23:28:49
>>eikenb+ze
I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.

I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.

◧◩
38. re-thc+gh[view] [source] [discussion] 2026-02-04 23:34:35
>>eikenb+ze
> that is they are describing a new type of programmer that will only use agents and never read the underlying code

> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly

This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.

Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?

replies(1): >>6510+WP
◧◩
39. jkhdig+di[view] [source] [discussion] 2026-02-04 23:40:22
>>eikenb+ze
Your analogy to PHP developers not reading assembly got me thinking.

Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.

There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

replies(6): >>ytoaww+Dk >>HansHa+Nn >>Quadma+vq >>drawnw+xz >>bandra+0C >>andai+BD
◧◩◪
40. ytoaww+Dk[view] [source] [discussion] 2026-02-04 23:57:58
>>jkhdig+di
For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Put another way, if you don't know what correct is before you start working then no tradeoff exists.

replies(1): >>majorm+CK
◧◩◪
41. HansHa+Nn[view] [source] [discussion] 2026-02-05 00:17:06
>>jkhdig+di
> which is faithfully translated by the (hopefully bug-free) compiler.

"Hey Claude, translate this piece of PHP code into Power10 assembly!"

42. bloomc+0o[view] [source] 2026-02-05 00:18:45
>>kace91+(OP)
I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).

I personally think we are not that far from it, but it will need something built on top of current CLI tools.

◧◩◪
43. Quadma+vq[view] [source] [discussion] 2026-02-05 00:38:46
>>jkhdig+di
Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.
replies(1): >>6510+KP
◧◩◪
44. sroeri+Fr[view] [source] [discussion] 2026-02-05 00:47:36
>>ink_13+i5
You can get an AI to listen to that bass solo for you
◧◩◪◨⬒
45. Curiou+1s[view] [source] [discussion] 2026-02-05 00:49:58
>>kace91+Y8
There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.
◧◩◪
46. djeast+sw[view] [source] [discussion] 2026-02-05 01:27:06
>>thefz+D6
One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.
replies(1): >>thefz+T21
47. andyfe+oz[view] [source] 2026-02-05 01:51:21
>>kace91+(OP)
The output is the program behavior. You use it, like a user, and give feedback to the coding agent.

If the app is too bright, you tweak the settings and build it again.

Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.

Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.

◧◩◪
48. drawnw+xz[view] [source] [discussion] 2026-02-05 01:52:16
>>jkhdig+di
It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.

Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.

◧◩◪
49. bandra+GB[view] [source] [discussion] 2026-02-05 02:09:01
>>thefz+D6
Already happening in the wild
◧◩◪
50. bandra+0C[view] [source] [discussion] 2026-02-05 02:11:09
>>jkhdig+di
OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".
◧◩◪
51. wtetzn+aC[view] [source] [discussion] 2026-02-05 02:12:33
>>kace91+f4
Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?
replies(1): >>kuschk+Di1
◧◩◪
52. andai+BD[view] [source] [discussion] 2026-02-05 02:25:39
>>jkhdig+di
> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.

But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!

For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.

(In the case of Windows 11 recently.. not so much ;)

replies(1): >>majorm+rK
◧◩◪
53. nubg+3I[view] [source] [discussion] 2026-02-05 03:06:26
>>strayd+Ga
People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.
◧◩
54. nubg+fI[view] [source] [discussion] 2026-02-05 03:08:28
>>rainco+E7
Good analogy.
◧◩◪◨
55. majorm+rK[view] [source] [discussion] 2026-02-05 03:25:47
>>andai+BD
> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.

In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.

The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)

◧◩◪◨
56. majorm+CK[view] [source] [discussion] 2026-02-05 03:27:13
>>ytoaww+Dk
> Put another way, if you don't know what correct is before you start working then no tradeoff exists.

This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.

"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.

replies(1): >>ytoaww+KZ
◧◩◪◨
57. 6510+KP[view] [source] [discussion] 2026-02-05 04:19:06
>>Quadma+vq
Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.
◧◩◪
58. 6510+WP[view] [source] [discussion] 2026-02-05 04:20:57
>>re-thc+gh
That is true for all languages. Very high quality until you use a lib, a module or an api.
◧◩
59. 6510+dQ[view] [source] [discussion] 2026-02-05 04:23:24
>>csalle+T2
The product is: solving a problem. Requirements vary.
◧◩◪◨⬒
60. ytoaww+KZ[view] [source] [discussion] 2026-02-05 06:03:58
>>majorm+CK
> This goes out the window the first time you get real users, though.

Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.

Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".

◧◩◪◨
61. thefz+T21[view] [source] [discussion] 2026-02-05 06:35:21
>>djeast+sw
What about writing the actual code yourself
◧◩◪◨
62. kuschk+Di1[view] [source] [discussion] 2026-02-05 08:56:37
>>wtetzn+aC
It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.
◧◩
63. sigseg+xn1[view] [source] [discussion] 2026-02-05 09:36:00
>>ForHac+O4
Hey it's me! I shoot with manual focus lenses in RAW and drive a standard. There are dozens of us!
[go to top]