zlacker

[return to "Un Ministral, Des Ministraux"]
1. barbeg+ok[view] [source] 2024-10-16 16:04:17
>>veggie+(OP)
Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm sure it's more efficient at encoding text but each token doesn't fit into a 16 bit register which must make it more inefficient computationally?
◧◩
2. cpldcp+gr1[view] [source] 2024-10-16 23:40:07
>>barbeg+ok
The tokens are immediately transformed into embeddings (very large vectors), so the 17 bit values are not used for any computation.
[go to top]