OrangePi 6 Plus Review

>>ekianj+(OP)
When something has an 30 TOPS NPU, what are the implications? Do NPUs like this have some common backend that ggml/llama.cpp targets? Is it proprietary and only works for some specific software? Does it have access to all the system RAM and at what bandwidth?

I know the concept has been around for a while but no idea if it actually means anything. I assume that people are targeting ones in common devices like Apple, but what about here?

>>andy99+Ta
Can't speak to this specific NPU but these kind of accelerators are really made more for more general ML things like machine vision etc. For example while people have made the (6 TOPS) NPU in the (similar board) RK3588 work with llama.cpp it isn't super useful because of the RAM constraints. I believe it has some sort of 32-bit memory addressing limit, so you can never give it more than 3 or 4 GB for example. So for LLMs, not all that useful.

zlacker