zlacker
[parent]
[thread]
0 comments
1. valine+(OP)
[view]
[source]
2025-05-23 22:29:46
Attention computes a weighted average of all previous latents. So yes, it’s a new token as input to the forward pass, but after it feeds through an attention head it contains a little bit of every previous latent.
[go to top]