ChatGPT Is a Gimmick

>>blueri+(OP)
It is refreshing to see I am not the only person who cannot get LLMs to say anything valuable. I have tried several times, but the cycle "You're right to question this. I actually didn't do anything you asked for. Here is some more garbage!" gets really old really fast.

It makes me wonder whether everyone else is kidding themselves, or if I'm just holding it wrong.

>>danlit+ag
Something I noticed a long time ago is that going from 90% correct to 95% correct is not a 5% difference, it’s a 2x difference. As you approach 100%, the last few 0.01% error rates going away make a qualitative difference.

“Computer” used to be a job, and human error rates are on the order of 1-2% no matter what level of training or experience they had. Work had to be done in triplicate and cross-checked if it mattered.

Digital computers are down to error rates roughly 10e-15 to 10e-22 and are hence treated as nearly infallible. We regularly write code routines where a trillion steps have to be executed flawlessly in sequence for things not to explode!

AIs can now output maybe 1K to 2K tokens in a sequence before they make a mistake. That’s 99.9% to 99.95%! Better than human already.

Don’t believe me?

Write me a 500 line program with pen and paper (not pencil!) and have it work the first time!

I’ve seen Gemini Pro 2.5 do this in a useful way.

As the error rates drop, the length of usefully correct sequences will get to 10K, then 100K, and maybe… who knows?

There was just a press release today about Gemini Diffusion that can alter already-generated tokens to correct mistakes.

Error rates will drop.

Useful output length will go up.

>>jiggaw+Rj
I don't think the length you're talking about is that much of an issue. As you say, depending on how you measure it, LLMs are better at remaining accurate over a long span of text.

The issue seems to be more in the intelligence department. You can't really leave them in an agent-like loop with compiler/shell output and expect them to meaningfully progress on their tasks past some small number of steps.

Improving their initial error-free token length is solving the wrong problem. I would take less initial accuracy than a human but equally capable of iterating on their solution over time.

zlacker