zlacker

[parent] [thread] 0 comments
1. cmrdpo+(OP)[view] [source] 2026-02-03 20:06:51
Yeah I got 35-39tok/sec for one shot prompts, but for real-world longer context interactions through opencode it seems to be averaging out to 20-30tok/sec. I tried both MXFP4 and Q4_K_XL, no big difference, unfortunately.

--no-mmap --fa on options seemed to help, but not dramatically.

As with everything Spark, memory bandwidth is the limitation.

I'd like to be impressed with 30tok/sec but it's sort of a "leave it overnight and come back to the results" kind of experience, wouldn't replace my normal agent use.

However I suspect in a few days/weeks DeepInfra.com and others will have this model (maybe Groq, too?), and will serve it faster and for fairly cheap.

[go to top]