zlacker

[return to "Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens"]
1. nullc+I4[view] [source] 2025-05-23 16:48:32
>>nyrikk+(OP)
Even when you train AI on human language, the tokens can have "subtext" that is only legible to the AI. And, unfortunately, it's not even legible to the AI in ways that it could ever explain it to us.

It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.

Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.

Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.

◧◩
2. candid+z9[view] [source] 2025-05-23 17:25:32
>>nullc+I4
IMO this is why natural language will always be a terrible _interface_--because English is a terrible _language_ where words can have wildly different meanings that change over time. There's no ambiguity with intentions with traditional UX (or even programming languages).
◧◩◪
3. nullc+Zl[view] [source] 2025-05-23 18:57:41
>>candid+z9
It can happen more or less no matter what language the model uses, so long as its reinforcement trained. It's just in English we have an illusion of thinking we understand the meaning.

An example of this is toki pona, a minimalist constructed human language that is designed to only express "positive thinking". Yet it is extremely easy to insult people in toki pona: e.g. sina toki li pona pona pona pona. (you are speaking very very very very well).

To be free of a potential subtext sidechannel there can be essentially no equivalent outputs.

◧◩◪◨
4. pona-a+Ny[view] [source] 2025-05-23 20:18:36
>>nullc+Zl
Can't you just say "sina toki ike suli a." (you are speaking very bad <exclamation>)? Just because it doesn't have official swearwords like most natural languages doesn't mean you can only express "positive thinking".
[go to top]