zlacker

[return to "Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens"]
1. nullc+I4[view] [source] 2025-05-23 16:48:32
>>nyrikk+(OP)
Even when you train AI on human language, the tokens can have "subtext" that is only legible to the AI. And, unfortunately, it's not even legible to the AI in ways that it could ever explain it to us.

It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.

Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.

Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.

◧◩
2. nihaku+X7[view] [source] 2025-05-23 17:12:47
>>nullc+I4
This is such a bonkers line of thinking, I'm so intrigued. So a particular model will have an entire 'culture' only available or understandable to itself. Seems kind of lonely. Like some symbols might activate together for reasons that are totally incomprehensible to us, but make perfect sense to the model. I wonder if an approach like the one in https://www.anthropic.com/research/tracing-thoughts-language... could ever give us insight into any 'inside jokes' present in the model.

I hope that research into understanding LLM qualia eventually allow us to understand e.g. what it's like to [be a bat](https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F)

[go to top]