zlacker

Another data point: I can (barely) run a 30B 4 bit quantized model on a Mac Mini with 32G on chip memory but it runs slowly (a little less than 10 tokens/second).

13B and 7B models run easily and much faster.