This statement is comparing the SIMT model to SIMD. Can anyone explain the last part about SIMT being better for many programs operating on its own data? Are they just saying you can have individual “threads” executing independently (via predication/masks and such)?
SIMT still expects coalesced memory access that's close together otherwise performance falls off a cliff