Tell me: how does this claim _constrain my expectations_ about what this (or future) models can do? Is there a specific thing that you predicted in advance that GPT-4 would be unable to do, which ended up being a correct prediction? Is there a specific thing you want to predict in advance of the next generation, that it will be unable to do?
Another paper not from Msft showing emergent task capabilities across a variety of LLMs as scale increases.
https://arxiv.org/pdf/2206.07682.pdf
You can hem and haw all you want but the reality is these models have internal representations of the world that can be probed via prompts. They are not stochastic parrots no matter how much you shout in the wind that they are.