zlacker

> It's not totally clear to me why simt won out over writing the vector operations.

From the user side, it is probably simpler to write an algorithm once without vectors, and have a compiler translate it to every vector ISA it supports, rather than to deal with each ISA by hand.

Besides, in many situations, having the algorithm executed sequentially or in parallel is irrelevant to the algorithm itself, so why introduce that concern?

> I'm certainly in the minority opinion here.

There are definitely more userland programmers than compiler/numerical library ones.

replies(1): >>JonChe+1U

>>alphab+(OP)
I think it's easier to write the kernel in terms of the scalar types if you can avoid the warp level intrinsics. If you need to mix that with the cross lane operations it gets very confusing. In the general case volta requires you to pass the current lane mask into intrinsics, so you get to try to calculate what the compiler will turn your branches into as a bitmap.

So if you need/want to reason partly in terms of warps, I think the complexity is lower to reason wholly in terms of warps. You have to use vector types and that's not wonderful, but in exchange you get predictable control flow out of the machine code.

Argument is a bit moot though, since right now you can't program either vendor hardware using vectors, so you also need to jump the barrier to assembly. None of the GPUs are very easy to program in assembly.