What's kind of amazing is that, in doing so, it actually learns to play chess! That is, the model weights naturally organize into something resembling an understanding of chess, just by trying to minimize error on next-token prediction.
It makes sense, but it's still kind of astonishing that it actually works.