zlacker

Yeah there's a lot of people that advocate for really slow inference on cheap infra. That's something else that should be expressed in this fidelity

Because honestly I don't care about 0.2 tps for my use cases although I've spoken with many who are fine with numbers like that.

At least the people I've talked to they talk about how if they have a very high confidence score that the model will succeed they don't mind the wait.

Essentially a task failure is 1 in 10, I want to monitor and retry.

If it's 1 in 1000, then I can walk away.

The reality is most people don't have a bearing on what this order of magnitude actually is for a given task. So unless you have high confidence in your confidence score, slow is useless

But sometimes you do...

replies(1): >>zozbot+E1

>>kristo+(OP)
If you launch enough tasks in parallel you aren't going to care that 1 in 10 failed, as long as the other 9 are good. Just rerun the failed job whenever you get around to it, the infra will still be getting plenty of utilization on the rest.