zlacker

[parent] [thread] 1 comments
1. barbeg+(OP)[view] [source] 2024-10-16 16:04:17
Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm sure it's more efficient at encoding text but each token doesn't fit into a 16 bit register which must make it more inefficient computationally?
replies(1): >>cpldcp+S61
2. cpldcp+S61[view] [source] 2024-10-16 23:40:07
>>barbeg+(OP)
The tokens are immediately transformed into embeddings (very large vectors), so the 17 bit values are not used for any computation.
[go to top]