zlacker
[parent]
[thread]
0 comments
1. valine+(OP)
[view]
[source]
2025-05-24 16:43:41
That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.
[go to top]