zlacker

Depending on the order of the arguments to min max you'll get an extra move instruction [1]:

std::min(max, std::max(min, v));

        maxsd   xmm0, xmm1
        minsd   xmm0, xmm2

std::min(std::max(v, min), max);

        maxsd   xmm1, xmm0
        minsd   xmm2, xmm1
        movapd  xmm0, xmm2

For min/max on x86 if any operand is NaN the instruction copies the second operand into the first. So the compiler can't reorder the second case to look like the first (to leave the result in xmm0 for the return value).

The reason for this NaN behavior is that minsd is implemented to look like `(a < b) ? a : b`, where if any of a or b is NaN the condition is false, and the expression evaluates to b.

Possibly std::clamp has the comparisons ordered like the second case?

[1]: https://godbolt.org/z/coes8Gdhz

replies(4): >>x1f604+Gf >>vitors+Oh >>miohta+c21 >>lebubu+GL2

>>cmovq+(OP)
I think the libstdc++ implementation does indeed have the comparisons ordered in the way that you describe. I stepped into the std::clamp() call in gdb and got this:

    ┌─/usr/include/c++/12/bits/stl_algo.h──────────────────────────────────────────────────────────────────────────────────────
    │     3617     \*  @pre `_Tp` is LessThanComparable and `(__hi < __lo)` is false.
    │     3618     \*/
    │     3619    template<typename _Tp>
    │     3620      constexpr const _Tp&
    │     3621      clamp(const _Tp& __val, const _Tp& __lo, const _Tp& __hi)
    │     3622      {
    │     3623        __glibcxx_assert(!(__hi < __lo));
    │  >  3624        return std::min(std::max(__val, __lo), __hi);
    │     3625      }
    │     3626

replies(1): >>cmovq+Ti

>>cmovq+(OP)
It seems that this is close to the most likely reason. See also:

https://godbolt.org/z/q7e3MrE66

>>x1f604+Gf
Thanks for sharing. I don't know if the C++ standard mandates one behavior or another, it really depends on how you want clamp to behave if the value is NaN. std::clamp returns NaN, while the reverse order returns the min value.

replies(2): >>cornst+8J >>x1f604+sw6

>>cmovq+Ti
From §25.8.9 Bounded value [alg.clamp]:

> 2 Preconditions: `bool(comp(proj(hi), proj(lo)))` is false. For the first form, type `T` meets the Cpp17LessThanComparable requirements (Table 26).

> 3 Returns: `lo` if `bool(comp(proj(v), proj(lo)))` is true, `hi` if `bool(comp(proj(hi), proj(v)))` is true, otherwise `v`.

> 4 [Note: If NaN is avoided, `T` can be a floating-point type. — end note]

From Table 26:

> `<` is a strict weak ordering relation (25.8)

replies(1): >>rahkii+Tj1

>>cmovq+(OP)
Sir cmovq, you have deserved your username.

>>cornst+8J
Does that mean NaN is undefined behavior for clamp?

replies(1): >>cornst+2s1

>>rahkii+Tj1
My interpretation is that yes, passing NaN is undefined behavior. Strict weak ordering is defined in 25.8 Sorting and related operations [alg.sorting]:

> 4 The term strict refers to the requirement of an irreflexive relation (`!comp(x, x)` for all `x`), and the term weak to requirements that are not as strong as those for a total ordering, but stronger than those for a partial ordering. If we define `equiv(a, b)` as `!comp(a, b) && !comp(b, a)`, then the requirements are that `comp` and `equiv` both be transitive relations:

> 4.1 `comp(a, b) && comp(b, c)` implies `comp(a, c)`

> 4.2 `equiv(a, b) && equiv(b, c)` implies `equiv(a, c)`

NaN breaks these relations, because `equiv(42.0, NaN)` and `equiv(NaN, 3.14)` are both true, which would imply `equiv(42.0, 3.14)` is also true. But clearly that's not true, so floating point numbers do not satisfy the strict weak ordering requirement.

The standard doesn't explicitly say that NaN is undefined behavior. But it does not define the behavior for when NaN is used with `std::clamp()`, which I think by definition means it's undefined behavior.

>>cmovq+(OP)
Yes, I arrived at the same conclusion.

The various code snippets in the article don't compute the same "function". The order between the min() and max() matters even when done "by hand". This is apparent when min is greater than max as the results differ in the choice of the boundaries.

Funny that for such simple functions the discussion can become quickly so difficult/interesting.

Some toying around with the various implementations in C [1]:

[1]: https://godbolt.org/z/d4Tcdojx3

replies(1): >>x1f604+zv6

>>lebubu+GL2
Yes, you are correct, the faster clamp is incorrect because it does not return v when v is equal to lo and hi.

>>cmovq+Ti
Based on my reading of cppreference, it is required to return negative zero when you do std::clamp(-0.0f, +0.0f, +0.0f) because when v compares equal to lo and hi the function is required to return v, which the official std::clamp does but my incorrect clamp doesn't.