zlacker

I got this running locally using llama.cpp from Homebrew and the Unsloth quantized model like this:

  brew upgrade llama.cpp # or brew install if you don't have it yet

Then:

  llama-cli \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja

That opened a CLI interface. For a web UI on port 8080 along with an OpenAI chat completions compatible endpoint do this:

  llama-server \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja

It's using about 28GB of RAM.

replies(2): >>techno+tt >>nubg+Ku

>>simonw+(OP)
what are your impressions?

replies(1): >>simonw+DQ

>>simonw+(OP)
what's the token per seconds speed?

>>techno+tt
I got Codex CLI running against it and was sadly very unimpressed - it got stuck in a loop running "ls" for some reason when I asked it to create a new file.

replies(3): >>daniel+na1 >>mhitza+SQ2 >>Camper+365

>>simonw+DQ
Yes sadly that sometimes happens - the issue is Codex CLI / Claude Code were designed for GPT / Claude models specifically, so it'll be hard for OSS models directly to utilize the full spec / tools etc, and might get loops sometimes - I would maybe try the MXFP4_MOE quant to see if it helps, and maybe try Qwen CLI (was planning to make a guide for it as well)

I guess until we see the day OSS models truly utilize Codex / CC very well, then local models will really take off

>>simonw+DQ
I would recommend you fiddle with the repeat penalty flags. I use local models often, and almost all I've tried needed that to prevent loops.

I'd also recommend dropping temperature down to 0. Any high temperature value feels like instructing the model "copy this homework from me but don't make it obvious".

>>simonw+DQ
You probably have seen it by now, but there was a llama.cpp issue that was fixed earlier today(?) to avoid looping and other sub-par results. Need to update llama-server as well as redownload the GGUFs (for certain quants).

https://old.reddit.com/r/unsloth/comments/1qvt6qy/qwen3coder...

replies(1): >>simonw+hc5

>>Camper+365
I hadn't seen that, thanks very much!