The experiment could be a little better by using a more descriptive form of notation than PGN. PGN notation's strength is the shorthand properties of it, because it is used by humans while playing the game. That is far from being a strength as LLM training data. ML algorithms, and LLMs are trained better by feeding them more descriptive and accurate data, and verbosity is not a problem at all. There is the FEN notation in which in every move the entire board is encoded.
One could easily imagine many different ways to describe a game, like encoding vertical and horizontal lines, listing what exact squares each piece is covering, what color squares, which of the pieces are able to move, and in each move generate one whole page of the board situation.
I call this spatial navigation, in which the LLM learns the ins and outs of it's training data and it is able to make more informed guesses. Chess is fun and all, but code generation has the potential to be a lot better than just writing functions. By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.
The author seems more interested in the ability to learn chess at a decent level from such a poor input, as well as what kind of world model it might build, rather than wanting to help it to play as well as possible.
The fact that it was able to build a decent model of the board position from PGN training samples, without knowing anything about chess (or that it was even playing chess) is super impressive.
It seems simple enough to learn that, for example, "Nf3" means that an "N" is on "f3", especially since predicting well requires you to know what piece is on each square.
However, what is not so simple is to have to learn - without knowing a single thing about chess - that "Nf3" also means that:
1) One of the 8 squares that is a knights move away from f3, and had an "N" on it, now has nothing on it. There's a lot going on there!
2) If "f3" previously had a different piece on it, that piece is now gone (taken) - it should no longer also be associated with "f3"