One could argue a common characteristic of the above exams is that they each test memory, and, as such, one could argue that GPT-4's above-average performance is not necessarily evidence of "reasoning". That is, GPT-4 has no "understanding" but it has formidable reading speed and retention (memory).
While preparation for the above exams depends heavily on memorisation, other exams may focus more on reasoning and understanding.
Surely GPT-4 would fail some exams. But when it comes to GPT-4's exam performance, only the positive results are reported.
I don’t think they will make this argument since it will heavily negate their (expected) argument that they’re not open-sourcing the model because of safety concerns.