(Note I'm not saying that you can't find examples of failures of intelligence. I'm just questioning whether this specific test is an example of one).
Also my bet would be that video capable models are better at this.
So back to the analogy, it could be as if the LLMs experience the equivalent of a very intense optical illusion in these cases, and then completely fall apart trying to make sense of it.