Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.
>>jeffbe+(OP)
Yep. Adding "-C target-cpu=native" to rustc on my desktop computer consistently gets a ~10-15% performance boost compared to the default target. The default target is extremely conservative. As far as I can tell, it doesn't take advantage of any CPU features added in the last 20 years. (The k8 came out in 2003.)
>>joseph+Q6
Red Hat Enterprise Linux has upgraded their default target to x86-64-v2 and is considering switching to x86-64-v3 for RHEL 10 (which should release around 2026?). I'd take that as a sign that those might be reasonable choices for newly released software.
Some linux distros also give you the option to either get a version compatible with ancient hardware or the optimized x86-64-v3 version, which seems like a good compromise.
>>jeffbe+f8
Funny that it stopped being the case for a while around 2006. AMD64 became widespread while also being very new, closing the gap between "default" and "native".
>>jeffbe+(OP)
Even with -march=x86-64-v4 at -O3 the compiler still generates fewer lines of assembly for the incorrect clamp compared to the correct clamp for this "realistic" code: