The dimensionality I suppose depends on the vocab size and your hidden dimension size, but that’s not really relevant. It’s a single linear projection to go from latents to logits.
Reasoning is definitely not happening in the linear projection to logits if that’s what you mean.