zlacker

A lot of people don't run multiple at a time.

It can make it more expensive if that option becomes popular.

But I think in most cases batching is actually the biggest _improvement_ in terms of cost effectiveness for operators, since it enables them to use the parallel throughout of the graphics device more fully by handling multiple inference requests (often from different customers) at once. (Unless they work like Bard by default).