Plugins were a failure. GPTs are a little better, but I still don't see the product market fit. GPT-4 is still king, but not by that much any more. It's not even clear that they're doing great research, because they don't publish.
GPT-5 has to be incredibly good at this point, and I'm not sure that it will be.
This isn’t a race to write the most lines of code or the most lines of text. It’s a race to write the most correct lines of code.
I’ll wait half an hour for a response if I know I’m getting at least staff engineer level tier of code for every question
Sufficiently accurate responses can be fed into other systems downstream and cleaned up. Even code responses can benefit from this by restricting output tokens using the grammar of the target language, or iterating until the code compiles successfully.
And for a decent number of LLM-enabled use cases the functionality unlocked by these models is novel. When you're going from 0 to 1 people will just be amazed that the product exists.