zlacker

[parent] [thread] 18 comments
1. swarni+(OP)[view] [source] 2023-07-31 12:47:13
Amazing that someone thought up a solution to a hypothetical problem 46 years ago, then fired it 30 billion km away
replies(6): >>behnam+m6 >>bumby+09 >>rcxdud+wj >>jjk166+zQ >>whartu+PQ >>kdinn+BE2
2. behnam+m6[view] [source] 2023-07-31 13:30:43
>>swarni+(OP)
Sometimes we don’t give enough credit to previous generations.
replies(1): >>detour+0A1
3. bumby+09[view] [source] 2023-07-31 13:45:31
>>swarni+(OP)
Aerospace has a very high quality standard compared to other industries.

Lots of formal processes capture what would otherwise be informal design decisions elsewhere. In this case, they probably have reams of pages detailing a failure mode effects analysis (FMEA). One mode is “oops, we sent the wrong command” and the document would define the specific design mitigation(s) for that outcome until it reaches an accepted risk threshold.

replies(1): >>aether+n62
4. rcxdud+wj[view] [source] 2023-07-31 14:25:41
>>swarni+(OP)
It's not really hypothetical: losing communication with stuff in space is a very common failure mode and a huge amount of the system design is focused on making it as unlikely as possible (generally the radio system gets a huge priority in almost everything and there are a lot of failsafes built at every level to make it possible to reestablish communication if anything disrupts it).
replies(2): >>JdeBP+QC >>dekhn+pl2
◧◩
5. JdeBP+QC[view] [source] [discussion] 2023-07-31 15:35:00
>>rcxdud+wj
Indeed. Voyager 2 has in fact been listening via its backup receiver since 1978.
6. jjk166+zQ[view] [source] 2023-07-31 16:27:37
>>swarni+(OP)
It wasn't a solution for this specific problem. Spacecraft orientations are going to drift over time, periodically rehoming is the simplest way of dealing with it. That it doesn't care whether the orientation drift was natural or artificial is just a bonus.
7. whartu+PQ[view] [source] 2023-07-31 16:28:29
>>swarni+(OP)
The Voyager that's flying now is not necessarily the Voyager that was launched.

The hardware is the same, but they've updated, patched, and rewritten the software that's running in it throughout the years.

I'm not suggesting that the failsafe mode wasn't originally considered, and implemented, but simply that it doesn't have to be the case. They could have made changes to it over time.

replies(1): >>Shawnj+OFh
◧◩
8. detour+0A1[view] [source] [discussion] 2023-07-31 19:52:53
>>behnam+m6
I only give credit to previous generations. Firm believer that we only understand in retrospect.
◧◩
9. aether+n62[view] [source] [discussion] 2023-07-31 22:44:49
>>bumby+09
FMEDA probably. And in recent times, fault tree analysis seems to be better for complex systems.
replies(1): >>bumby+Ku2
◧◩
10. dekhn+pl2[view] [source] [discussion] 2023-08-01 00:32:30
>>rcxdud+wj
I was amused to learn that if modern satellites lose contact with earth, they go into "safe mode": pointing towards sun, solar panels fully deployed, everything else except telemetry, radio, and temperature management disabled, waiting for further instructions. https://en.wikipedia.org/wiki/Safe_mode_in_spacecraft

Imagine deploying a billion dollar piece of hardware and hoping that it has enough intelligence to keep itself from burning up before you can reestablish contact!

◧◩◪
11. bumby+Ku2[view] [source] [discussion] 2023-08-01 01:53:44
>>aether+n62
As far as I’m aware, no NASA standards call for FMEDA. It doesn’t mean a project manager couldn’t levy it, but it’s not often that a contractor adds additional requirements to a gov funded build.
replies(1): >>aether+3C2
◧◩◪◨
12. aether+3C2[view] [source] [discussion] 2023-08-01 03:06:54
>>bumby+Ku2
FMEA relies on a really smart person anticipating all the different combinations of failures worth exploring (NxM), not just N or M.

Some failures are fairly common, and individual failures might be fairly inert but have more serious consequences if they are cascaded with another specific failure.. for example, cruise control enable + failure of steering wheel control pad _and_ previously undetected failure of brake sensor/brake light circuit = cruise control stuck ON. Actually, this failure is inert if the cruise control is OFF when it happens. Contrived example but you get the idea ...

I have seen a lot of FMEDA (and other tool) use lately to combat concerns with cascading failure, but not sure what's currently standard at NASA or how they deal with this. I would think cascading failure would be their expected scenario on a 10+ year unmanned mission.

replies(1): >>bumby+cn3
13. kdinn+BE2[view] [source] 2023-08-01 03:39:11
>>swarni+(OP)
Actually there's a couple of work arounds for this problem as they anticipated it all along. My father was Director of Operations at Tidbinbilla deep space tracking station which ran most of the comms to Voyager 1.

I am paraphrasing what he said as a non-technical person: Voyager has both a dish receiver, and a pole antenna. The dish is the usual mechanism for comms but in an emergency such as this they would send commands to the other antenna. To do this they would turn the main tracking station dish up to max, and send a "TURN AROUND!" signal out.

But prior to that they had to alert the local electricity grid, and the local air traffic control to not have any planes flying over at the time!

I guess the Voyagers are too far away for this manoeuvre now.

◧◩◪◨⬒
14. bumby+cn3[view] [source] [discussion] 2023-08-01 11:35:05
>>aether+3C2
NASA STDs, handbooks, guidebooks, NPDs and NPRs are all open-source. They don’t mention FMEDA, and they don’t generally have a detectability column in their FMEA. IMO they are a little outdated
replies(1): >>sheeps+uD3
◧◩◪◨⬒⬓
15. sheeps+uD3[view] [source] [discussion] 2023-08-01 13:35:56
>>bumby+cn3
I've done for NASA what they were calling FMECA and FTA for a subsystem. They had a lot of freedom to tailor the analysis to the situation, and the end result didn't quite match anything established. We addressed detection in some of the FMECA columns which are not traditionally for detection; and events in some of the FTA. It was a contortion of terminology and format to modernize and maximize the value of the analysis given their limited resources and the bureaucracy of what they were allowed/required to do on paper.

Here's how I would describe the possible analysis approaches in broad terms, avoiding terminology that NASA does not officially use.

- Start from the hazard of being pointed in the wrong direction and work backwards to identify the causes, forming a tree.

- Start from the event of commanding the wrong direction and work forwards to identify mitigations or the lack thereof, also forming a tree.

- Start from looking at a component or subsystem, list all the ways it can fail without regard for the application. Then consider the application and work up towards the causes/events.

- Close any gaps between the top-down and bottom-up approaches.

replies(1): >>bumby+zH3
◧◩◪◨⬒⬓⬔
16. bumby+zH3[view] [source] [discussion] 2023-08-01 13:58:38
>>sheeps+uD3
Yes, what you're describing is two different approaches for safety analysis. According to the NASA software engineering handbook [1]

"Software Fault Tree Analysis (SFTA) is a top-down approach to failure analysis which begins with thinking about potential failures or malfunctions (What could go wrong?) and then thinking through all the possible ways that such a failure or malfunction could occur. Fault Tree Analysis (FTA), is often used by the hardware teams to identify potential hazards that might be caused by failures in hardware components or systems, but with the SFTA, the software isn’t considered the hazard, but it can be a cause or contributor when considered in the context of the system."

"The Software Failure Modes and Effects Analysis (SFMEA) is a bottom up approach where each component is examined and all the possible ways it can fail are listed. Each possible failure is traced through the system to see what effect it might have on the system and to determine if it results in a hazardous state. Then the likelihood of the failure and the severity of the system failure can be considered."

But, to the earlier post, these are driven by hard requirements; specifically adherence to NASA STD 7150.2 and NPR 7150.2. Developers/contractors can tailor/waive them with pre-approval but, in general, they tend to go in the direction of less requirements, not more. This may all be moot because I think Voyager pre-dates any of those requirement documents and I'm not sure what existed in the late 1970s.

[1] https://swehb.nasa.gov/

replies(1): >>sheeps+PF4
◧◩◪◨⬒⬓⬔⧯
17. sheeps+PF4[view] [source] [discussion] 2023-08-01 17:58:20
>>bumby+zH3
The D aspect of the FMEA I worked on was motivated by a reliability requirement, not by 7150.2. 70's NASA was using FTA and FMEA but avoiding putting numbers on top-level analysis. I imagine they did whatever ad-hoc analysis they thought was necessary for such a highly publicized mission even if it wasn't a separate deliverable.

Edit: The comment you deleted right before I could reply was good! I think people would enjoy and benefit from your description of how the process works if you're willing to repost it.

As you noted the reliability requirement did in fact flow down from an engineering requirement which is why they exceeded the minimum FMEA standards. There's no official guidance on where and how exactly to track that information so they put it in the usual place but in an unusual way. The lack of a standard during Voyager's time probably impacted the visibility of the work more than the substance.

replies(1): >>aether+pba
◧◩◪◨⬒⬓⬔⧯▣
18. aether+pba[view] [source] [discussion] 2023-08-03 03:16:45
>>sheeps+PF4
This thread was a good read, thanks.
◧◩
19. Shawnj+OFh[view] [source] [discussion] 2023-08-05 07:12:01
>>whartu+PQ
It’s possible to update the Voyager FSW?
[go to top]