zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. fooker+Ah[view] [source] 2024-01-16 13:18:31
>>x1f604+(OP)
If you benchmark these, you'll likely find the version with the jump edges out the one with the conditional instruction in practice.
◧◩
2. pclmul+xn[view] [source] 2024-01-16 13:57:56
>>fooker+Ah
Compilers often under-generate conditional instructions. They implicitly assume (correctly) that most branches you write are 90/10 (ie very predictable), not 50/50. The branches that actually are 50/50 suffer from being treated as being 90/10.
◧◩◪
3. fooker+Tx[view] [source] 2024-01-16 14:59:59
>>pclmul+xn
The branches in this example are not 50/50.

Given a few million calls of clamp, most would be no-ops in practice. Modern CPUs are very good at dynamically observing this.

◧◩◪◨
4. pclmul+x93[view] [source] 2024-01-17 06:26:04
>>fooker+Tx
Do you know that for a fact? For all calls of clamp? I have definitely used min and max when they are true 50/50s and I assume clamp also gets some similar use.
◧◩◪◨⬒
5. fooker+2m3[view] [source] 2024-01-17 08:07:27
>>pclmul+x93
Modern compilers generate code assuming all branches are highly predictable.

If your use case does not follow that pattern and you really care about performance, you have to pull out something like inline assembly.

Consider software like ffmpeg which have to do this for the sake of performance.

[go to top]