zlacker

[parent] [thread] 0 comments
1. x1f604+(OP)[view] [source] 2024-01-18 09:13:41
I don't think it's a register allocation failure but is in fact necessitated by the ABI requirement (calling convention) for the first parameter to be in xmm0 and the return value to also be placed into xmm0.

So when you have an algorithm like clamp which requires v to be "preserved" throughout the computation you can't overwrite xmm0 with the first instruction, basically you need to "save" and "restore" it which means an extra instruction.

I'm not sure why this causes the extra assembly to be generated in the "realistic" code example though. See https://godbolt.org/z/hd44KjMMn

[go to top]