zlacker

[parent] [thread] 0 comments
1. mark_l+(OP)[view] [source] 2023-11-19 04:53:55
Another data point: I can (barely) run a 30B 4 bit quantized model on a Mac Mini with 32G on chip memory but it runs slowly (a little less than 10 tokens/second).

13B and 7B models run easily and much faster.

[go to top]