zlacker

Test suites just increased in value by a lot and code decreased in value.

replies(3): >>__mhar+4s >>utopia+bS >>well_a+722

>>andrew+(OP)
This is one of the reasons why I just wrote a testing book (beta reviews giving feedback now). Testing is one of those boring subjects that many programmers ignore. But it just got very relevant. Especially TDD.

>>andrew+(OP)
Doubt it, code will be generated to pass tests, not the intent behind the tests.

replies(4): >>daxfoh+Df1 >>krashi+Vq1 >>andrew+JF1 >>Art968+BJ2

>>utopia+bS
A million times, this. Sometimes they luck into the intent, but much more frequently they end up in a ball of mud that just happens to pass the tests.

"8 unit tests? Great, I'll code up 8 branches so all your tests pass!" Of course that neglects the fact that there's now actually 2^8 paths through your code.

>>utopia+bS
if you can steer an LLM to write an application based on what you want, you can steer an LLM to write the tests you want. Some people will be better at getting the LLM to write tests, but it's only going to get easier and easier

>>utopia+bS
I think we agree - getting the llms to understand your intent is the hard part, at the very least you need well specified tests.

Perhaps more advanced llms + specifications + better tests.

>>andrew+(OP)
No, OP is merely an AI deepthroater that will blindly swallow whatever drivel is put out by AI companies and then "benchmark" it by having it generate a pelican (oh and he got early access to the model), then call whatever he puts out "AI optimism"

The reality of things is, AI still can't handle long running tasks without blowing $500k worth of tokens for an end result that doesn't work, and further work is another $100k worth to get nothing novel.

replies(1): >>Xmd5a+G42

>>well_a+722
Where are you pulling these numbers from? I'm genuinely interested. Is it the kind of budget you need to spend in order to have Claude build a Word clone?

>>utopia+bS
What makes you think the next generation models won't be explicitly trained to prevent this, or any other pitfall or best practice as the low hanging fruit fall one by one?