The efficiency of finetuned models is quite, quite a bit improved at the cost of giving up the rest of the world to do specific things, and disk space to have a few dozen local finetunes (or even hundreds+ for SaaS services) is peanuts compared to acquiring 80GB of VRAM on a single device for monomodels