Strictly speaking, it should be a mistake to assign a probability equal to zero to any moves, even for illegal board moves, but especially for an AI that learns by example and self-play. It never gets taught the rules, it only gets shown the games -- there's no reason that it should conclude that the probability of a rook moving diagonally is exactly zero just because it's never seen it happen in the data, and gets penalized in training every time it tries it.
But even for a human, assigning probability of exactly zero is too strong. It would forbid any possibility that you misunderstand any rules, or forgot any special cases. It's a good idea to always maintain at least a small amount of epistemic humility that you might be mistaken about the rules, so that sufficiently overwhelmingly strong evidence could convince you that a move you thought was illegal turns out to be legal.
On the contrary, by allowing vectors for unrelated concepts to be only almost orthogonal, it's possible to represent a much larger number of unrelated concepts. https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of...
In machine learning, this phenomenon is known as polysemanticity or superposition https://transformer-circuits.pub/2022/toy_model/index.html