zlacker

[parent] [thread] 1 comments
1. bcoate+(OP)[view] [source] 2025-05-23 22:21:55
Either I'm wildly misunderstanding or that can't possibly be true--if you sample at high temperature and it chooses a very-low probability token, it continues consistent with the chosen token, not with the more likely ones
replies(1): >>valine+A
2. valine+A[view] [source] 2025-05-23 22:29:46
>>bcoate+(OP)
Attention computes a weighted average of all previous latents. So yes, it’s a new token as input to the forward pass, but after it feeds through an attention head it contains a little bit of every previous latent.
[go to top]