zlacker

[parent] [thread] 2 comments
1. klipt+(OP)[view] [source] 2025-04-05 17:14:55
> LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1).

But enough data implies probabilities. Consider 2 sentences:

"For breakfast I had oats"

"For breakfast I had eggs"

Training on this data, how do you complete "For breakfast I had..."?

There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"

replies(1): >>ksec+V8
2. ksec+V8[view] [source] 2025-04-05 18:28:18
>>klipt+(OP)
So it is still largely, probabilities pattern matching?
replies(1): >>klipt+BY1
◧◩
3. klipt+BY1[view] [source] [discussion] 2025-04-06 16:15:22
>>ksec+V8
You can model the whole universe with probabilities!
[go to top]