But the model could in principle just have learned a long list of rote heuristics that happen to predict notation strings well, without having made the inferential leap to a much simpler set of rules, and a learner weaker than a LLM could well have got stuck at that stage.
I wonder how well a human (or a group of humans) would fare at the same task and if they could also successfully reconstruct chess even if they had no prior knowledge of chess rules or notation.
(OTOH a GPT3+ level LLM certainly does know that chess notation is related to something called "chess", which is a "game" and has certain "rules", but to what extent is it able to actually utilize that information?)