Chess-GPT's Internal World Model

>>homarp+(OP)
If you take a neural network that already knows the basic rules of chess and train it on chess games, you produce a chess engine.

From the Wikipedia page on one of the strongest ever[1]: "Like Leela Zero and AlphaGo Zero, Leela Chess Zero starts with no intrinsic chess-specific knowledge other than the basic rules of the game. Leela Chess Zero then learns how to play chess by reinforcement learning from repeated self-play"

[1]: https://en.wikipedia.org/wiki/Leela_Chess_Zero

>>wavemo+tm1
As described in the OP's blog post https://adamkarvonen.github.io/machine_learning/2024/01/03/c... - one of the incredible things here is that the standard GPT architecture, trained from scratch from PGN strings alone, can intuit the rules of the game from those examples, without any notion of the rules of chess or even that it is playing a game.

Leela, by contrast, requires a specialized structure of iterative tree searching to generate move recommendations: https://lczero.org/dev/wiki/technical-explanation-of-leela-c...

Which is not to diminish the work of the Leela team at all! But I find it fascinating that an unmodified GPT architecture can build up internal neural representations that correspond closely to board states, despite not having been designed for that task. As they say, attention may indeed be all you need.

>>btown+Np1
> can intuit the rules of the game from those examples,

I am pretty sure a bunch of matrix multiplications can't intuit anything.

naively, it doesn't seem very surprising that enormous amounts of self play cause the internal structure to reflect the inputs and outputs?

>>banana+lC1
> I am pretty sure a bunch of matrix multiplications can't intuit anything.

I don't understand how people can say things like this when universal approximation is an easy thing to prove. You could reproduce Magnus Carlsen's exact chess-playing stochastic process with a bunch of matrix multiplications and nonlinear activations, up to arbitrarily small error.

>>golol+bF1
I read such statements as being claims that "intuition" is part of consciousness etc.

It's still too strong a claim given that matrix multiplication also describes quantum mechanics and by extension chemistry and by extension biology and by extension our own brains… but I frequently encounter examples of mistaking two related concepts for synonyms, and I assume in this case it is meant to be a weaker claim about LLMs not being conscious.

Me, I think the word "intuition" is fine, just like I'd say that a tree falling in a forest with no one to hear it does produce a sound because sound is the vibration of the air instead of the qualia.

>>ben_w+ZF1
No, matrix multiplication is the system humans use to make predictions about those things but it doesn’t describe their fundamental structure and there’s no reason to imply they do.

zlacker