zlacker

[parent] [thread] 1 comments
1. minima+(OP)[view] [source] 2023-09-12 17:47:35
"Running" and "acceptable inference speed and quality" are two different constraints, particularly at scale/production.
replies(1): >>moonch+fW
2. moonch+fW[view] [source] 2023-09-12 21:13:20
>>minima+(OP)
I don't understand what you're trying to say ?

From what I've read 4090 should blow A100 away if you can fit within 22GB VRAM, which a 7B model should comfortably.

And the latency (along with variability and availability) on OpenAI API is terrible because of the load they are getting.

[go to top]