> One could argue a common characteristic of the above exams is that they each test memory, and, as such, one could argue that GPT-4's above-average performance is not necessarily evidence of "reasoning". That is, GPT-4 has no "understanding" but it has formidable reading speed and retention (memory).
I don’t think they will make this argument since it will heavily negate their (expected) argument that they’re not open-sourcing the model because of safety concerns.