It's sad to me that Larrabee didn't catch on, as that might have been a path to a good parallel computer, one that has efficient parallel throughput like a GPU, but also agility more like a CPU, so you don't need to batch things into huge dispatches and wait RPC-like latencies for them to complete. Apparently the main thing that sunk it was power consumption.
[1]: https://learn.microsoft.com/en-us/windows/win32/direct3darti...