I can’t imagine any other example where people voluntarily move for a black box approach.
Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
The output of code isn't just the code itself, it's the product. The code is a means to an end.
So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".
It is right often enough that your time is better spent testing the functionality than the code.
Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).
I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.
There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?
I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?
If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code
I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.
The AI also writes the black box tests, what am I missing here?
code is not the output. functionality is the output, and you do look at that.
But you are not. That’s the point?
> Where's the people in this case?
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.
My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.
I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.
My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.
Specious analogies don't help anything.
If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.
In other words, if “the ai is checking as well” no one is.
It's producing seemingly working code faster than you can closely review it.
Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.
etc...etc...
> In other words, if “the ai is checking as well” no one is.
"I tried nothing, and nothing at all worked!"
I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.
> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly
This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.
Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?
Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.
There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
Put another way, if you don't know what correct is before you start working then no tradeoff exists.
"Hey Claude, translate this piece of PHP code into Power10 assembly!"
I personally think we are not that far from it, but it will need something built on top of current CLI tools.
If the app is too bright, you tweak the settings and build it again.
Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.
Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.
Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.
The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.
But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!
For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.
(In the case of Windows 11 recently.. not so much ;)
It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.
In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.
The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)
This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.
"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.
Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.
Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".