It is possible the model calculates an approximate board state, which is different from the board state but equivalent for most games, but not all games. It would be interesting to train adversarial policy to check this. From KataGo attack we know this does happen for Go AIs: Go rules have a concept of liberty, but so called pseudoliberty is easier to calculate and equivalent for most cases (but not all cases). In fact, human programmers also used pseudoliberty to optimize their engines. Adversarial attack found Go AIs also use pseudoliberty.
Yes - this is exactly what the probes show.
One interesting aspect is that it still learns to play when trained on blocks of move sequences starting from the MIDDLE of the game, so it seems it must be incrementally inferring the board state by what's being played rather than just by tracking the moves.