When he started OpenAI, no one was spending anywhere near that much on compute. He should have listened to gwern.
It was quite reasonable to think that there would be rapidly diminishing returns in model size.
Wrong, in hindsight, but that's how hindsight is.
Honestly no, it was obvious, but only if you listened to those pie in the sky singularity people. It was quite common for them to say, add lots of nodes and transistors and a bunch of layers and stir in some math and intelligence will pop out.
The groups talking about minimal data and processing have not had any breakthroughs in, like forever.