I know the concept has been around for a while but no idea if it actually means anything. I assume that people are targeting ones in common devices like Apple, but what about here?
Even if it worked though, they're usually heavily bandwidth bottlenecked and near useless for LLM inference. CPU wins every time.