one more thing, that guide says:
> You can choose UD-Q4_K_XL or other quantized versions.
I see eight different 4-bit quants (I assume that is the size I want?).. how to pick which one to use?
IQ4_XS
Q4_K_S
Q4_1
IQ4_NL
MXFP4_MOE
Q4_0
Q4_K_M
Q4_K_XLAlso, depending on how much regular system RAM you have, you can offload mixture-of-expert models like this, keeping only the most important layers on your GPU. This may let you use larger, more accurate quants. That is functionality that is supported by llama.cpp and other frameworks and is worth looking into how to do.