zlacker

There's been much prior work done to take these models down from datacenter size to single GPU size. Given continued work in that area and improving GPU performance it seems like it's just a matter of years before inference can be cheap and local for even the most impressive of generation.