>>ekianj+(OP)
When something has an 30 TOPS NPU, what are the implications? Do NPUs like this have some common backend that ggml/llama.cpp targets? Is it proprietary and only works for some specific software? Does it have access to all the system RAM and at what bandwidth?
I know the concept has been around for a while but no idea if it actually means anything. I assume that people are targeting ones in common devices like Apple, but what about here?