zlacker

The brand new Qwen3-Coder-Next runs at 300Tok/s PP and 40Tok/s on M1 64GB with 4-bit MLX quant. Together with Qwen Code (fork of Gemini) it is actually pretty capable.

Before that I used Qwen3-30B which is good enough for some quick javascript or Python, like 'add a new endpoint /api/foobar which does foobaz'. Also very decent for a quick summary of code.

It is 530Tok/s PP and 50Tok/s TG. If you have it spit out lots of the code that is just copy of the input, then it does 200Tok/s, i.e. 'add a new endpoint /api/foobar which does foobaz and return the whole file'