zlacker
[parent]
[thread]
1 comments
1. aiiizz+(OP)
[view]
[source]
2025-05-24 11:46:44
Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
replies(1):
>>valine+5s
◧
2. valine+5s
[view]
[source]
2025-05-24 16:43:41
>>aiiizz+(OP)
That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.
[go to top]