zlacker

[parent] [thread] 1 comments
1. aiiizz+(OP)[view] [source] 2025-05-24 11:46:44
Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
replies(1): >>valine+5s
2. valine+5s[view] [source] 2025-05-24 16:43:41
>>aiiizz+(OP)
That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.
[go to top]