zlacker

[parent] [thread] 2 comments

SIMT let's a scheduler get clever about memory accesses, SIMD can practically only access memory linearly (scatter gather can do better but it's still usually quite linear) whereas SIMT can be much smarter in terms of having lots of similar bits of work going on in ways that use the bandwidth maximally and don't overlap.

replies(1): >>kllrno+E

>>mhh__+(OP)
https://developer.nvidia.com/blog/how-access-global-memory-e...

SIMT still expects coalesced memory access that's close together otherwise performance falls off a cliff

replies(1): >>the_sv+yL1

>>kllrno+E
Yes, but not all thread in the block need to. As long as you fill a cache line you’re good.

[go to top]