Similarly, if you attempt to port this (admittedly neat) system to ARM using such approach there'd be a lot of bumps. Beginning with all those clever SSE2 hacks not being able to expand automatically, through deficiency of caching and optimization assumptions made for original architecture in new circumstances, to conceptual mismatch of certain parts. E.g. what if you want to handle a case when system has to boot up in ARM Thumb mode? You just can't abstract that away with assembly.