zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. fooker+Ah[view] [source] 2024-01-16 13:18:31
>>x1f604+(OP)
If you benchmark these, you'll likely find the version with the jump edges out the one with the conditional instruction in practice.
◧◩
2. pclmul+xn[view] [source] 2024-01-16 13:57:56
>>fooker+Ah
Compilers often under-generate conditional instructions. They implicitly assume (correctly) that most branches you write are 90/10 (ie very predictable), not 50/50. The branches that actually are 50/50 suffer from being treated as being 90/10.
◧◩◪
3. IainIr+Y12[view] [source] 2024-01-16 21:53:20
>>pclmul+xn
It's hard to predict statically which branches will be dynamically unpredictable.

A seasoned hardware architect once told me that Intel went all-in on predication for Itanium, under the assumption that a Sufficiently Smart Compiler could figure it out, and then discovered to their horror that their compiler team's best efforts were not Sufficiently Smart. He implied that this was why Intel pushed to get a profile-guided optimization step added to the SPEC CPU benchmark, since profiling was the only way to get sufficiently accurate data.

I've never gone back to see whether the timeline checks out, but it's a good story.

◧◩◪◨
4. fooker+E22[view] [source] 2024-01-16 21:57:12
>>IainIr+Y12
The compiler doesn't do much of the predicting, it's done by the CPU in runtime.
[go to top]