Chess-GPT's Internal World Model

>>homarp+(OP)
World model might be a too big word here. When we talk of a world model (in the context of AI motels), we refer to its understanding of the world, at least in the context we trained it. But what I see is just a visualization of the output in a fashion similar to a chess board. A stronger evidence would be a for example a map of the next move, which will show whether it truly understood the game’s rules. If it show probability larger than zero on illegal board fields, it will show us why it sometimes makes illegal moves. And obviously, it didn’t fully understand the rules of the game.

>>sinuhe+4Z
> probability larger than zero

Strictly speaking, it should be a mistake to assign a probability equal to zero to any moves, even for illegal board moves, but especially for an AI that learns by example and self-play. It never gets taught the rules, it only gets shown the games -- there's no reason that it should conclude that the probability of a rook moving diagonally is exactly zero just because it's never seen it happen in the data, and gets penalized in training every time it tries it.

But even for a human, assigning probability of exactly zero is too strong. It would forbid any possibility that you misunderstand any rules, or forgot any special cases. It's a good idea to always maintain at least a small amount of epistemic humility that you might be mistaken about the rules, so that sufficiently overwhelmingly strong evidence could convince you that a move you thought was illegal turns out to be legal.

>>mitthr+n61
There's got to be a probability cut-off, though. LLMs don't infinitely connect every token with every other token, some aren't connected at all, even if some association is taught, right?

>>redcob+Kb1
The weights have finite precision which means they represent value-ranges / have error bars. So even if the weight is exactly 0 it does not represent complete confidence in it never occurring.

>>the847+Hd1
A weight necessitates a relationship, but I’m arguing LLMs don’t create all relationships. So a connection wouldn’t even exist.

>>redcob+jr1
When relationships are represented implicitly by the magnitude of the dot product between two vectors, there's no particular advantage to not "creating" all relationships (i.e. enforcing orthogonality for "uncreated" relationships).

On the contrary, by allowing vectors for unrelated concepts to be only almost orthogonal, it's possible to represent a much larger number of unrelated concepts. https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of...

In machine learning, this phenomenon is known as polysemanticity or superposition https://transformer-circuits.pub/2022/toy_model/index.html

>>yorwba+X42
That’s not right; there are many vectors that go unbuilt between unrelated tokens. Creating a ton of empty relationships would obviously generate an immense amount of useless data.

Your links are not about actually orthogonal vectors, so they’re not relevant. Also that’s not what superposition is defined as in your own links:

> In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition

zlacker