zlacker

Errors in software rarely ever matter and even when they do, can usually be trivially corrected.

replies(4): >>crooke+81 >>progra+a5 >>bumby+16 >>crater+4a

>>throwa+(OP)
Except when they do matter, like the Therac-25 deaths or those 737 MAX crashes.

>>crooke+81
> 737 MAX crashes

To imply this was a software bug is a pretty silly representation - the system was poorly engineered and didn't have proper contingencies for sensor disagreement. This is pretty clearly a design/engineering error with a software component.

Besides, the guy said "rarely ever matter" for a reason, not "explicitly never impact things"... Bit of a silly comment from you IMO

replies(1): >>bumby+s5

>>throwa+(OP)
It’s not life or death, but time spent dealing with errors - debugging, the direct effects, understanding full impact - isn’t a resource we can get back.

replies(2): >>wizofa+b9 >>Bossin+ha

>>nawgz+m2
To view software in isolation is an equally silly representation. In the physical world, software is part of an overall system that needs to be considered holistically. Most major safety-critical mishaps are the result of several failures, often across different domains.

In the case of the 737MAX, the software was a design around a physical constraint; that doesn't mean the software doesn't matter. Most software is designed as a workaround of a certain physical or mental constraint.

>>throwa+(OP)
Software does not wear out like most physical components, but they often cause failure in interaction/coordinating between subsystems.

As the amount of coordination increases, the number of failure modes tends to grow quite fast. That's why software failures in physical, safety-critical systems are not trivially corrected. There are a lot of second order effects that need to be considered.

replies(1): >>Qem+jd

>>progra+a5
I find myself thinking about that a lot - mainly "how many more hours would have needed to be spent at stage A to avoid the hours being spent now to recover from problems our software is currently causing". And often if I'm honest with myself it's hard to see that the extra investment of time earlier on would have necessarily resulted in a net productivity gain. It would however likely be a less stressful way to work (building fire-proof code rather than putting fires out all the time), and rather more satisfying. As an engineer of any sort I think it's perfectly reasonable and justifiable to want to produce something of quality even if it takes longer and the consequences probably won't be that terrible if you just release the first thing you can slap together. Unfortunately others are almost entirely motivated by the (not entirely irrational) fear of what happens if you don't release something quickly enough.

>>throwa+(OP)
Honestly I can't imagine someone who hasn't been living under a rock for the last half century could say this. Just one example: Knight Capitol was the largest trader in U.S. equities, with a market share of 17.3% on NYSE and 16.9% on NASDAQ in 2012, right up until August 1, 2012, when it lost $460 million and 75% of its equity value because of a software error. What was left of it was acquired in December of that year.

>>progra+a5
It's funny you say that, because designing systems that work extremely well, have contingencies upon contingencies, and can be relied upon (e.g. as a life-critical system) is so time consuming and (I imagine) mind numbingly boring (e.g. reviews upon reviews of white papers to ensure that the system spec is scientifically sound) that I'd guess time is the last thing you'd get back from writing NASA-style applications.

>>crooke+81
If you're referring to MCAS in 737, the software itself wasn't the main problem; I'd say that the main problem was that it wasn't even a documented feature (let alone the engineering of the system itself).

The pilot couldn't even turn MCAS off originally. That's not a software thing, that's a "who the F designed this" thing.

>>bumby+16
> Software does not wear out like most physical components.

It fails like buildings near fault lines, because the ground moves under them. Think broken dependencies, operating system obsolescence, et cetera.

replies(1): >>bumby+gu

>>Qem+jd
I like this analogy. Although your example focused on software-centric coordination, I think it's important to also extend it to non-software systems.

An apropos and famous example is the Ariane 5 rocket mishap. The same validated software from the Ariane 4 was used, but the hardware design changed. Specifically, the velocity of the Ariane 5 exceeded that of its predecessor and exceeded the 16-bit variable used.