Special values like NaN are half-assed sum types. The latter give you compiler guarantees.
In my opinion it’s overall cleaner if the compiler handles enforcing it when it can. Something like “ensure variable is initialized” can just be another compiler check.
Combined with an effects system that lets you control which errors to enforce checking on or not. Nim has a nice `forbids: IOException` that lets users do that.
Only sometimes, when the compiler happens to be able to understand the code fully enough. With sum types it can be enforced all the time, and bypassed when the programmer explicitly wants it to be.
From memory, I have heard "infecting all downstream" as both "a feature" and "a problem". Experience with numpy programs did lead to sentinels in the https://github.com/c-blake/nio Nim package, though.
Another way to try to investigate popularity here is to see how much code uses signaling NaN vs. quiet NaN and/or arguments pro/con those things / floating point exceptions in general.
I imagine all of it comes down to questions of how locally can/should code be forced to confront problems, much like arguments about try/except/catch kinds of exception handling systems vs. other alternatives. In the age of SIMD there can be performance angles to these questions and essentially "batching factors" for error handling that relate to all the other batching factors going on.
Today's version of this wiki page also includes a discussion of Integer Nan: https://en.wikipedia.org/wiki/NaN . It notes that the R language uses the minimal signed value (i.e. 0x80000000) of integers for NA.
There is also the whole database NULL question: https://en.wikipedia.org/wiki/Null_(SQL)
To be clear, I am not taking some specific position, but I think all these topics inform answers to your question. I think it's something with trade-offs that people have a tendency to over-simplify based on a limited view.
tbh this system (assuming it works that way) would be more strict at compile-time than the vast majority of languages.
That's fair, I wasn't dimsissing the practice but rather just commenting that it's a shame the author didn't clarify their preference.
I don't think the popularity angle is a good proxy for usefulness/correction of the practice. Many factors can influence popularity.
Performance is a very fair point, I don't know enough to understand the details but I could see it being a strong argument. It is counter intuitive to move forward with calculations known to be useless, but maybe the cost of checking all calculations for validity is larger than the savings of skipping early the invalid ones.
There is a catch though. Numpy and R are very oriented to calculation pipelines, which is a very different usecase to general programming, where the side effects of undetected 'corrupt' values can be more serious.
Anyway, this topic of "error handling scoping/locality" may be the single most cross-cutting topic across CPUs, PLangs, Databases, and operating systems (I would bin Numpy/R under Plangs+Databases as they are kind of "data languages"). Consequently, opinions can be very strong (often having this sense of "Everything hinges on this!") in all directions, but rarely take a "complete" view.
If you are interested in "fundamental, not just popularity" discussions, and it sounds like you are, I feel like the database community discussions are probably the most "refined/complete" in terms of trade-offs, but that could simply be my personal exposure, and DB people tend to ignore CPU SIMD because it's such a "recent" innovation (hahaha, Seymore Cray was doing it in the 1980s for the Cray-3 Vector SuperComputer). Anyway, just trying to help. That link to the DB Null page I gave is probably a good starting point.
That being said, I agree that the way NaNs propagate is messy. You can end up only finding out that there was an error much later during the program's execution and then it can be tricky to find out where it came from.
If you actually want the compiler to check this on the level of the type system, it'd have to be `NonNaNFloat | NaN`. Then you can check which one you have and continue with a float that is guaranteed to not be NaN.
But (importantly) a NonNaNFloat is not the same as a float, and this distinction has to be encoded in the type system if you want to take this approach seriously. This distinction is NOT supported by most type systems (including Rust's std afaik, fwiw). It's similar to Rust's NonZero family of types.
Unfortunately, Rust doesn't seem to be smart enough to represent `Option<NotNan<f64>>` in 8 bytes, even though in theory it should be possible (it does the analogous thing with `Option<NonZero<u64>>`).
This thread is discussing the possibility of adding such an optimization: https://internals.rust-lang.org/t/add-float-types-with-niche...
An approach that I think would have most of the same correctness benefits as a proper sum type while being more ergonomic: Have two float types, one that can represent any float and one that can represent only finite floats. Floating-point operations return a finite float if all operands are of finite-float type, or an arbitrary float if any operand is of arbitrary-float type. If all operands are of finite-float type but the return value is infinity or NaN, the program panics or equivalent.
(A slightly more out-there extension of this idea: The finite-float type also can't represent negative zero. Any operation on finite-float-typed operands that would return negative zero returns positive zero instead. This means that finite floats obey the substitution property, and (as a minor added bonus) can be compared for equality by a simple bitwise comparison. It's possible that this idea is too weird, though, and there might be footguns in the case where you convert a finite float to an arbitrary one.)
Hypothetically, no, the float type would not admit NaNs. You would be prevented from storing NaNs in them explicitly, and operations capable of producing NaNs would produce a `float | nan` type that is distinct from float, and can't be treated like float until it's checked for NaN.
And I'm not sure why it's being discussed as though this is some esoteric language feature. This is precisely the way non-nullable types work in languages like Kotlin and TypeScript. The underlying machine representation of the object is capable of containing null values, yes, but the compiler doesn't let you treat it as such (without certain workarounds).
If my grandmother had wheels, she'd be a bike.
I suppose there's precedent of sorts in signaling NaNs (and NaNs in general, since FPUs need to account for payloads), but I don't know how much software actually makes use of sNaNs/payloads, nor how those features work in GPUs/super-performance-sensitive code.
I also feel that as far as Rust goes, the NonZero<T> types would seem to point towards not using the described finite/arbitrary float scheme as the NonZero<T> types don't implement "regular" arithmetic operations that can result in 0 (there's unsafe unchecked operations and explicit checked operations, but no +/-/etc.).
The key disanalogy between NonZero and the "finite float" idea is that zero comes up all the time in basically every kind of math, so you can't just use NonZero everywhere in your code; you have to constantly deal with the seam converting between the two types, which is the most unwieldy part of the scheme. By contrast, in many programs infinity and NaN are never expected to come up, and if they do it's a bug, so if you're in that situation you can just use the finite-float type throughout.
I suppose that's a fair point. I guess a better analogy might be to operations on normal integer types, where overflow is considered an error but that is not reflected in default operator function signatures.
I do want to circle back a bit and say that my mention of signaling NaNs would probably have been better served by a discussion of floating point exceptions more generally. In particular, I feel like existing IEEE floating point technically supports something like what you propose via hardware floating point exceptions and/or sNaNs, but I don't know how well those capabilities are actually supported (e.g., from what I remember the C++ interface for dealing with that kind of thing was clunky at best). I want to say that lifting those semantics into programming languages might interfere with normally desirable optimizations as well (e.g., effectively adding a branch after floating point operations might interfere with vectorization), though I suppose Rust could always pull what it did with integer overflow and turn off checks in release mode, as much as I dislike that decision.
This is fine, I guess, but it will cause a bunch of problems since e.g. Division of two floats has to be able to return NaNs. At that point you either need to require a check to see if the value is NaN (inconvenient and annoying) or allow people to just proceed. Not sure I am exactly sold on this so far.
The parent commenter stated that sum types work differently from a hypothetical float / NaN split, because compilers can't always "understand the code fully enough" to enforce checks. I simply responded that that is not true in principle, since you could just treat non-nan floats the same way that many languages treat non-null types.
Indeed, everything you're describing about non-nan floats applies equally to sum types - you can't operate on them unless you pattern match. You're highlighting the exact point I'm trying to make!
The fact that you consider this system "inconvenient", is entirely irrelevant to this discussion. Maybe the designer of Nim simply cares more about NaN-safety than you or I do. Who knows. Regardless, the original statement (that sum types and non-nan floats can't work the same way) is incorrect.
if name != nil:
echo name
versus case name
of Some(unwrappedName):
echo unwrappedNameI was thinking about this the other day for integer wrapping specifically, given that it's not checked in release mode for Rust (by default at least, I think there's a way to override that?). I suspect that it's also influenced by the fact that people kinda expect to be able to use operators for arithmetic, and it's not really clear how to deal with something like `a + b + c` in a way where each step has to be fallible; you could have errors propagate and then just have `(a + b + c)?`, but I'm not sure that would be immediately intuitive to people, or you could require it to be explicit at each step, e.g. `((a + b)? + c))?`, but that would be fairly verbose. The best I could come up with is to have a macro that does the first thing, which I imagine someone has probably already written before, where you could do something like `checked!(a + b + c)`, and then have it give a single result. I could almost imagine a language with more special syntax for things having a built-in operator for that, like wrapping it in double backticks or something rather than `checked!(...)`.