zlacker

I run Qwen3-Coder-30B-A3B-Instruct gguf on a VM with 13gb RAM and a 6gb RTX 2060 mobile GPU passed through to it with ik_llama, and I would describe it as usable, at least. It's running on an old (5 years, maybe more) Razer Blade laptop that has a broken display and 16gb RAM.

I use opencode and have done a few toy projects and little changes in small repositories and can get pretty speedy and stable experience up to a 64k context.

It would probably fall apart if I wanted to use it on larger projects, but I've often set tasks running on it, stepped away for an hour, and had a solution when I return. It's definitely useful for smaller project, scaffolding, basic bug fixes, extra UI tweaks etc.

I don't think "usable" a binary thing though. I know you write lot about this, but it'd be interesting to understand what you're asking the local models to do, and what is it about what they do that you consider unusable on a relative monster of a laptop?

replies(3): >>regula+ck >>simonw+WA >>codedo+D93

>>1dom+(OP)
I've had usable results with qwen3:30b, for what I was doing. There's definitely a knack to breaking the problem down enough for it.

What's interesting to me about this model is how good it allegedly is with no thinking mode. That's my main complaint about qwen3:30b, how verbose its reasoning is. For the size it's astonishing otherwise.

>>1dom+(OP)
Honestly I've been completely spoiled by Claude Code and Codex CLI against hosted models.

I'm hoping for an experience where I can tell my computer to do a thing - write a code, check for logged errors, find something in a bunch of files - and I get an answer a few moments later.

Setting a task and then coming back to see if it worked an hour later is too much friction for me!

>>1dom+(OP)
30-A3B model gives 13 t/s without GPU (I noticed that token/sec * # of params matches memory bandwidth).