>>daniel+(OP)
Using lmstudio-community/Qwen3-Coder-Next-GGUF:Q8_0 I'm getting up to 32 tokens/s on Strix Halo, with room for 128k of context (out of 256k that the model can manage).
From very limited testing, it seems to be slightly worse than MiniMax M2.1 Q6 (a model about twice its size). I'm impressed.