zlacker

> "Modern" languages try to avoid exceptions by using sum types and pattern matching plus lots of sugar to make this bearable. I personally dislike both exceptions and its emulation via sum types. ... I personally prefer to make the error state part of the objects: Streams can be in an error state, floats can be NaN and integers should be low(int) if they are invalid.

Special values like NaN are half-assed sum types. The latter give you compiler guarantees.

replies(5): >>kace91+V4 >>elcrit+55 >>SJMG+pJ >>saghm+V21 >>lairv+Do1

>>esafak+(OP)
I’d like to see their argument for it. I see no help in pushing NaN as a number through a code path corrupting all operations it is part of, and the same is true for the others.

replies(3): >>cb321+he >>snek_c+TC >>otabde+hI2

>>esafak+(OP)
The compiler can still enforce checks, such as with nil checks for pointers.

In my opinion it’s overall cleaner if the compiler handles enforcing it when it can. Something like “ensure variable is initialized” can just be another compiler check.

Combined with an effects system that lets you control which errors to enforce checking on or not. Nim has a nice `forbids: IOException` that lets users do that.

replies(2): >>umanwi+G7 >>ux2664+Hh

>>elcrit+55
> The compiler can still enforce checks, such as with nil checks for pointers.

Only sometimes, when the compiler happens to be able to understand the code fully enough. With sum types it can be enforced all the time, and bypassed when the programmer explicitly wants it to be.

replies(1): >>wavemo+jo

>>kace91+V4
There is no direct argument/guidence that I saw for "when to use them", but masked arrays { https://numpy.org/doc/stable/reference/maskedarray.html } (an alternative to sentinels in array processing sub-languages) have been in NumPy (following its antecedents) from its start. I'm guessing you could do a code-search for its imports and find arguments pro & con in various places surrounding that.

From memory, I have heard "infecting all downstream" as both "a feature" and "a problem". Experience with numpy programs did lead to sentinels in the https://github.com/c-blake/nio Nim package, though.

Another way to try to investigate popularity here is to see how much code uses signaling NaN vs. quiet NaN and/or arguments pro/con those things / floating point exceptions in general.

I imagine all of it comes down to questions of how locally can/should code be forced to confront problems, much like arguments about try/except/catch kinds of exception handling systems vs. other alternatives. In the age of SIMD there can be performance angles to these questions and essentially "batching factors" for error handling that relate to all the other batching factors going on.

Today's version of this wiki page also includes a discussion of Integer Nan: https://en.wikipedia.org/wiki/NaN . It notes that the R language uses the minimal signed value (i.e. 0x80000000) of integers for NA.

There is also the whole database NULL question: https://en.wikipedia.org/wiki/Null_(SQL)

To be clear, I am not taking some specific position, but I think all these topics inform answers to your question. I think it's something with trade-offs that people have a tendency to over-simplify based on a limited view.

replies(1): >>kace91+Aw

>>elcrit+55
Both of these things respectively are just pattern matches and monads, just not user-definable ones.

replies(1): >>xigoi+5q3

>>umanwi+G7
There's nothing preventing this for floats and ints in principle. e.g. the machine representation could be float, but the type in the eyes of the compiler could be `float | nan` until you check it for nan (at which point it becomes `float`). Then any operation which can return nan would return `float | nan` instead.

tbh this system (assuming it works that way) would be more strict at compile-time than the vast majority of languages.

replies(1): >>Mond_+4n1

>>cb321+he
>To be clear, I am not taking some specific position, but I think all these topics inform answers to your question. I think it's something with trade-offs that people have a tendency to over-simplify based on a limited view.

That's fair, I wasn't dimsissing the practice but rather just commenting that it's a shame the author didn't clarify their preference.

I don't think the popularity angle is a good proxy for usefulness/correction of the practice. Many factors can influence popularity.

Performance is a very fair point, I don't know enough to understand the details but I could see it being a strong argument. It is counter intuitive to move forward with calculations known to be useless, but maybe the cost of checking all calculations for validity is larger than the savings of skipping early the invalid ones.

There is a catch though. Numpy and R are very oriented to calculation pipelines, which is a very different usecase to general programming, where the side effects of undetected 'corrupt' values can be more serious.

replies(1): >>cb321+aC

>>kace91+Aw
The conversation around Nim for the past 20 years has been rather fragmented - IRC channels, Discord channels (dozens, I think), later the Forum, Github issue threads, pull request comment threads, RFCs, etc. Araq has a tendency to defend his ideas in one venue (sometimes quite cogently) and leave it to questioners to dig up where those trade-off conversations might be. I've disliked the fractured nature of the conversation for the 10 years I've known about it, but assigned it to a kind of "kids these days, whachagonnado" status. Many conversations (and life!) are just like that - you kind of have to "meet people where they are".

Anyway, this topic of "error handling scoping/locality" may be the single most cross-cutting topic across CPUs, PLangs, Databases, and operating systems (I would bin Numpy/R under Plangs+Databases as they are kind of "data languages"). Consequently, opinions can be very strong (often having this sense of "Everything hinges on this!") in all directions, but rarely take a "complete" view.

If you are interested in "fundamental, not just popularity" discussions, and it sounds like you are, I feel like the database community discussions are probably the most "refined/complete" in terms of trade-offs, but that could simply be my personal exposure, and DB people tend to ignore CPU SIMD because it's such a "recent" innovation (hahaha, Seymore Cray was doing it in the 1980s for the Cray-3 Vector SuperComputer). Anyway, just trying to help. That link to the DB Null page I gave is probably a good starting point.

>>kace91+V4
The reason NaN exists is for performance AFAIK. i.e. on a GPU you can't really have exceptions. You don't want to be constantly checking "did this individual floating-point op produce an error?" It's easier and faster for the individual floating point unit to flag the output as a NaN. Obviously NaNs long predate GPUs, but floating-point support was also hardware accelerated in a variety of ways for a long time.

That being said, I agree that the way NaNs propagate is messy. You can end up only finding out that there was an error much later during the program's execution and then it can be tricky to find out where it came from.

replies(1): >>beagle+Ot1

>>esafak+(OP)
Not a defense of the poison value approach, but in this thread Araq (Nim's principal author) lays out his defense for exceptions.

https://forum.nim-lang.org/t/9596#63118

>>esafak+(OP)
Yeah, I'm not sure I've ever seen NaN called or as an example to be emulated before, rather than something people complain about.

replies(1): >>echelo+k61

>>saghm+V21
Holy shit, I'd love to see NaN as a proper sum type. That's the way to do it. That would fix everything.

replies(1): >>amelia+9U1

>>wavemo+jo
This is a bit confused. You're saying `float`, but a float comes with NaN by default. Any float can take NaN values.

If you actually want the compiler to check this on the level of the type system, it'd have to be `NonNaNFloat | NaN`. Then you can check which one you have and continue with a float that is guaranteed to not be NaN.

But (importantly) a NonNaNFloat is not the same as a float, and this distinction has to be encoded in the type system if you want to take this approach seriously. This distinction is NOT supported by most type systems (including Rust's std afaik, fwiw). It's similar to Rust's NonZero family of types.

replies(2): >>umanwi+Mw1 >>wavemo+gX1

>>esafak+(OP)
That's why I always disliked calling null the "billion dollar mistake", null and Options<T> are basically the same, the mistake is not checking it at compile time

replies(1): >>the_gi+NZ1

>>snek_c+TC
The alternative is checking the result of every operation; or use “signaling NaNs” that raise an exception on a (properly configured) scalar operation on a CPU. As soon as non scalar code is involved - SIMD or GPU, quiet NaNs with strategically placed explicit tests along the computation becomes the only reasonable/efficient option.

>>Mond_+4n1
Indeed this isn't anywhere in the Rust standard library, but there is `ordered_float::NotNan`: https://docs.rs/ordered-float/latest/ordered_float/struct.No... .

Unfortunately, Rust doesn't seem to be smart enough to represent `Option<NotNan<f64>>` in 8 bytes, even though in theory it should be possible (it does the analogous thing with `Option<NonZero<u64>>`).

This thread is discussing the possibility of adding such an optimization: https://internals.rust-lang.org/t/add-float-types-with-niche...

>>echelo+k61
I suspect that this would result in a lot of .unwrap() calls or equivalent, and people would treat them as line noise and find them annoying.

An approach that I think would have most of the same correctness benefits as a proper sum type while being more ergonomic: Have two float types, one that can represent any float and one that can represent only finite floats. Floating-point operations return a finite float if all operands are of finite-float type, or an arbitrary float if any operand is of arbitrary-float type. If all operands are of finite-float type but the return value is infinity or NaN, the program panics or equivalent.

(A slightly more out-there extension of this idea: The finite-float type also can't represent negative zero. Any operation on finite-float-typed operands that would return negative zero returns positive zero instead. This means that finite floats obey the substitution property, and (as a minor added bonus) can be compared for equality by a simple bitwise comparison. It's possible that this idea is too weird, though, and there might be footguns in the case where you convert a finite float to an arbitrary one.)

replies(2): >>aw1621+lt2 >>saghm+VQ3

>>Mond_+4n1
You keep talking about Rust, but I'm not referring to Rust. This thread is discussing a (hypothetical, as-yet undeveloped) type system for a new version of Nim.

Hypothetically, no, the float type would not admit NaNs. You would be prevented from storing NaNs in them explicitly, and operations capable of producing NaNs would produce a `float | nan` type that is distinct from float, and can't be treated like float until it's checked for NaN.

And I'm not sure why it's being discussed as though this is some esoteric language feature. This is precisely the way non-nullable types work in languages like Kotlin and TypeScript. The underlying machine representation of the object is capable of containing null values, yes, but the compiler doesn't let you treat it as such (without certain workarounds).

replies(1): >>Mond_+TE2

>>lairv+Do1
...and if everything was wrapped in Option<>.

If my grandmother had wheels, she'd be a bike.

>>amelia+9U1
> Have two float types, one that can represent any float and one that can represent only finite floats. Floating-point operations return a finite float if all operands are of finite-float type, or an arbitrary float if any operand is of arbitrary-float type. If all operands are of finite-float type but the return value is infinity or NaN, the program panics or equivalent.

I suppose there's precedent of sorts in signaling NaNs (and NaNs in general, since FPUs need to account for payloads), but I don't know how much software actually makes use of sNaNs/payloads, nor how those features work in GPUs/super-performance-sensitive code.

I also feel that as far as Rust goes, the NonZero<T> types would seem to point towards not using the described finite/arbitrary float scheme as the NonZero<T> types don't implement "regular" arithmetic operations that can result in 0 (there's unsafe unchecked operations and explicit checked operations, but no +/-/etc.).

replies(1): >>amelia+Lz2

>>aw1621+lt2
Rust's NonZero basically exists only to enable layout optimizations (e.g., Option<NonZero<usize>> takes up only one word of memory, because the all-zero bit pattern represents None). It's not particularly aiming to be used pervasively to improve correctness.

The key disanalogy between NonZero and the "finite float" idea is that zero comes up all the time in basically every kind of math, so you can't just use NonZero everywhere in your code; you have to constantly deal with the seam converting between the two types, which is the most unwieldy part of the scheme. By contrast, in many programs infinity and NaN are never expected to come up, and if they do it's a bug, so if you're in that situation you can just use the finite-float type throughout.

replies(1): >>aw1621+8B2

>>amelia+Lz2
> By contrast, in many programs infinity and NaN are never expected to come up, and if they do it's a bug, so if you're in that situation you can just use the finite-float type throughout.

I suppose that's a fair point. I guess a better analogy might be to operations on normal integer types, where overflow is considered an error but that is not reflected in default operator function signatures.

I do want to circle back a bit and say that my mention of signaling NaNs would probably have been better served by a discussion of floating point exceptions more generally. In particular, I feel like existing IEEE floating point technically supports something like what you propose via hardware floating point exceptions and/or sNaNs, but I don't know how well those capabilities are actually supported (e.g., from what I remember the C++ interface for dealing with that kind of thing was clunky at best). I want to say that lifting those semantics into programming languages might interfere with normally desirable optimizations as well (e.g., effectively adding a branch after floating point operations might interfere with vectorization), though I suppose Rust could always pull what it did with integer overflow and turn off checks in release mode, as much as I dislike that decision.

>>wavemo+gX1
Huh? Rust is just an example here. What I am saying is that you're just redefining float to mean NonNaNFloat.

This is fine, I guess, but it will cause a bunch of problems since e.g. Division of two floats has to be able to return NaNs. At that point you either need to require a check to see if the value is NaN (inconvenient and annoying) or allow people to just proceed. Not sure I am exactly sold on this so far.

replies(1): >>wavemo+bj3

>>kace91+V4
There is no argument. It's literally just a "programming is hard, let's go shopping" sentiment.

>>Mond_+TE2
Nobody's trying to sell you on anything. You again seem to be out-of-context with respect to the discussion being had.

The parent commenter stated that sum types work differently from a hypothetical float / NaN split, because compilers can't always "understand the code fully enough" to enforce checks. I simply responded that that is not true in principle, since you could just treat non-nan floats the same way that many languages treat non-null types.

Indeed, everything you're describing about non-nan floats applies equally to sum types - you can't operate on them unless you pattern match. You're highlighting the exact point I'm trying to make!

The fact that you consider this system "inconvenient", is entirely irrelevant to this discussion. Maybe the designer of Nim simply cares more about NaN-safety than you or I do. Who knows. Regardless, the original statement (that sum types and non-nan floats can't work the same way) is incorrect.

>>ux2664+Hh
On the other hand, it’s more ergonomic and readable because you don’t need to declare a new name.

  if name != nil:
    echo name

versus

  case name
  of Some(unwrappedName):
    echo unwrappedName

>>amelia+9U1
> I suspect that this would result in a lot of .unwrap() calls or equivalent, and people would treat them as line noise and find them annoying.

I was thinking about this the other day for integer wrapping specifically, given that it's not checked in release mode for Rust (by default at least, I think there's a way to override that?). I suspect that it's also influenced by the fact that people kinda expect to be able to use operators for arithmetic, and it's not really clear how to deal with something like `a + b + c` in a way where each step has to be fallible; you could have errors propagate and then just have `(a + b + c)?`, but I'm not sure that would be immediately intuitive to people, or you could require it to be explicit at each step, e.g. `((a + b)? + c))?`, but that would be fairly verbose. The best I could come up with is to have a macro that does the first thing, which I imagine someone has probably already written before, where you could do something like `checked!(a + b + c)`, and then have it give a single result. I could almost imagine a language with more special syntax for things having a built-in operator for that, like wrapping it in double backticks or something rather than `checked!(...)`.