Some failures are fairly common, and individual failures might be fairly inert but have more serious consequences if they are cascaded with another specific failure.. for example, cruise control enable + failure of steering wheel control pad _and_ previously undetected failure of brake sensor/brake light circuit = cruise control stuck ON. Actually, this failure is inert if the cruise control is OFF when it happens. Contrived example but you get the idea ...
I have seen a lot of FMEDA (and other tool) use lately to combat concerns with cascading failure, but not sure what's currently standard at NASA or how they deal with this. I would think cascading failure would be their expected scenario on a 10+ year unmanned mission.
Here's how I would describe the possible analysis approaches in broad terms, avoiding terminology that NASA does not officially use.
- Start from the hazard of being pointed in the wrong direction and work backwards to identify the causes, forming a tree.
- Start from the event of commanding the wrong direction and work forwards to identify mitigations or the lack thereof, also forming a tree.
- Start from looking at a component or subsystem, list all the ways it can fail without regard for the application. Then consider the application and work up towards the causes/events.
- Close any gaps between the top-down and bottom-up approaches.
"Software Fault Tree Analysis (SFTA) is a top-down approach to failure analysis which begins with thinking about potential failures or malfunctions (What could go wrong?) and then thinking through all the possible ways that such a failure or malfunction could occur. Fault Tree Analysis (FTA), is often used by the hardware teams to identify potential hazards that might be caused by failures in hardware components or systems, but with the SFTA, the software isn’t considered the hazard, but it can be a cause or contributor when considered in the context of the system."
"The Software Failure Modes and Effects Analysis (SFMEA) is a bottom up approach where each component is examined and all the possible ways it can fail are listed. Each possible failure is traced through the system to see what effect it might have on the system and to determine if it results in a hazardous state. Then the likelihood of the failure and the severity of the system failure can be considered."
But, to the earlier post, these are driven by hard requirements; specifically adherence to NASA STD 7150.2 and NPR 7150.2. Developers/contractors can tailor/waive them with pre-approval but, in general, they tend to go in the direction of less requirements, not more. This may all be moot because I think Voyager pre-dates any of those requirement documents and I'm not sure what existed in the late 1970s.
Edit: The comment you deleted right before I could reply was good! I think people would enjoy and benefit from your description of how the process works if you're willing to repost it.
As you noted the reliability requirement did in fact flow down from an engineering requirement which is why they exceeded the minimum FMEA standards. There's no official guidance on where and how exactly to track that information so they put it in the usual place but in an unusual way. The lack of a standard during Voyager's time probably impacted the visibility of the work more than the substance.