Qwen3-Coder-Next

>>daniel+(OP)
For someone who is very out of the loop with these AI models, can someone explain what I can actually run on my 3080ti (12G)? Is this something like that or is this still too big; is there anything remotely useful runnable with my GPU? I have 64G RAM if that helps (?).

>>zokier+lM
This model does not fit in 12G of VRAM - even the smallest quant is unlikely to fit. However, portions can be offloaded to regular RAM / CPU with a performance hit.

I would recommend trying llama.cpp's llama-server with models of increasing size until you hit the best quality / speed tradeoff with your hardware that you're willing to accept.

The Unsloth guides are a great place to start: https://unsloth.ai/docs/models/qwen3-coder-next#llama.cpp-tu...

>>Albino+NW
Thanks for the pointers!

one more thing, that guide says:

> You can choose UD-Q4_K_XL or other quantized versions.

I see eight different 4-bit quants (I assume that is the size I want?).. how to pick which one to use?

    IQ4_XS
    Q4_K_S
    Q4_1
    IQ4_NL
    MXFP4_MOE
    Q4_0
    Q4_K_M
    Q4_K_XL

zlacker