zlacker

[return to "Qwen3-Coder-Next"]
1. simonw+l3[view] [source] 2026-02-03 16:15:21
>>daniel+(OP)
This GGUF is 48.4GB - https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/... - which should be usable on higher end laptops.

I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.

Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: https://unsloth.ai/docs/models/qwen3-coder-next

◧◩
2. codazo+fm1[view] [source] 2026-02-03 22:00:07
>>simonw+l3
I can't get Codex CLI or Claude Code to use small local models and to use tools. This is because those tools use XML and the small local models have JSON tool use baked into them. No amount of prompting can fix it.

In a day or two I'll release my answer to this problem. But, I'm curious, have you had a different experience where tool use works in one of these CLIs with a small local model?

◧◩◪
3. zackif+Aq1[view] [source] 2026-02-03 22:23:05
>>codazo+fm1
I'm using this model right now in claude code with LM Studio perfectly, on a macbook pro
◧◩◪◨
4. codazo+hs1[view] [source] 2026-02-03 22:32:20
>>zackif+Aq1
You mean Qwen3-Coder-Next? I haven't tried that model itself, yet, because I assume it's too big for me. I have a modest 16GB MacBook Air so I'm restricted to really small stuff. I'm thinking about buying a machine with a GPU to run some of these.

Anywayz, maybe I should try some other models. The ones that haven't worked for tool calling, for me are:

Llama3.1

Llama3.2

Qwen2.5-coder

Qwen3-coder

All these in 7b, 8b, or sometimes 30b (painfully) models.

I should also note that I'm typically using Ollama. Maybe LM Studio or llama.cpp somehow improve on this?

◧◩◪◨⬒
5. vessen+Ow2[view] [source] 2026-02-04 06:58:13
>>codazo+hs1
I’m mostly out of the local model game, but I can say confidently that Llama will be a waste of time for agentic workflows - it was trained before agentic fine tuning was a thing, as far as I know. It’s going to be tough for tool calling, probably regardless of format you send the request in. Also 8b models are tiny. You could significantly upgrade your inference quality and keep your privacy with say a machine at lambda labs, or some cheaper provider, though. Probably for $1/hr - where an hour is a many times more inference than an hour on your MBA.
[go to top]