zlacker

[parent] [thread] 28 comments
1. dralle+(OP)[view] [source] 2022-07-09 02:16:25
I hope this is only temporary. Where else will we discuss AWS outages when AWS goes down?

Not even a joke.

replies(5): >>cherio+h >>f0e4c2+X >>banana+p1 >>Pakdef+m4 >>lazyli+PE
2. cherio+h[view] [source] 2022-07-09 02:18:12
>>dralle+(OP)
The IP appears to be us-west-2, so we will still be able to discuss us-east-1 outages alright!
replies(1): >>ignora+k5
3. f0e4c2+X[view] [source] 2022-07-09 02:22:45
>>dralle+(OP)
If architected correctly, stuff deployed into aws stays up during all but the most extreme outages.
replies(2): >>oars+o2 >>fulafe+ey
4. banana+p1[view] [source] 2022-07-09 02:26:38
>>dralle+(OP)
It appears to be pointing at a bare EC2 instance, no doubt a lift-and-shift.

Even during the most extreme AWS events, my EC2 instances running dedicated servers kept seeing Internet traffic.

replies(1): >>pmoria+FM
◧◩
5. oars+o2[view] [source] [discussion] 2022-07-09 02:32:41
>>f0e4c2+X
And then where do we discuss these extreme outages?
replies(5): >>shawnz+Y2 >>rubyis+b3 >>static+Xa >>VoidWh+vd >>mekste+Ed
◧◩◪
6. shawnz+Y2[view] [source] [discussion] 2022-07-09 02:37:46
>>oars+o2
where did you discuss outages of HN's previous provider?
replies(1): >>dralle+h8
◧◩◪
7. rubyis+b3[view] [source] [discussion] 2022-07-09 02:38:51
>>oars+o2
reddit.... shudders
replies(1): >>marioj+wl
8. Pakdef+m4[view] [source] 2022-07-09 02:45:56
>>dralle+(OP)
Twitter is nice for learning about outages...
◧◩
9. ignora+k5[view] [source] [discussion] 2022-07-09 02:53:02
>>cherio+h
but... the s3 buckets are in us-east-1, and postgres in ap-southeast-2. More regions better than one, for maximum impact with minimum effort.
replies(2): >>pojzon+Zt >>pmoria+nL
◧◩◪◨
10. dralle+h8[view] [source] [discussion] 2022-07-09 03:19:11
>>shawnz+Y2
Less people are interested in outages of HN's previous provider.
◧◩◪
11. static+Xa[view] [source] [discussion] 2022-07-09 03:43:39
>>oars+o2
If there's an extreme AWS outage, you're pretty much stuck to in-person or POTS.
◧◩◪
12. VoidWh+vd[view] [source] [discussion] 2022-07-09 04:09:51
>>oars+o2
HAM Radio
replies(1): >>namech+Mg
◧◩◪
13. mekste+Ed[view] [source] [discussion] 2022-07-09 04:11:13
>>oars+o2
Better to discuss the next day when it's over than seeing bunch of upset comments realtime posted.
replies(1): >>atmosx+nc1
◧◩◪◨
14. namech+Mg[view] [source] [discussion] 2022-07-09 04:40:29
>>VoidWh+vd
People joke but from the top of a decent mountain I can reach 45 miles into the bay area from me on a $150 2m radio in my truck with rooftop antenna.
replies(2): >>loxias+Qm >>iasay+IA
◧◩◪◨
15. marioj+wl[view] [source] [discussion] 2022-07-09 05:28:50
>>rubyis+b3
maybe not...

https://aws.amazon.com/solutions/case-studies/reddit-aurora-...

◧◩◪◨⬒
16. loxias+Qm[view] [source] [discussion] 2022-07-09 05:40:53
>>namech+Mg
That is so cool. If I had that (truck+radio) I'd be very tempted to scatter a few cheap relay nodes at good locations in the bay area. Monkeybrains+comcast is quite reliable.
◧◩◪
17. pojzon+Zt[view] [source] [discussion] 2022-07-09 06:59:21
>>ignora+k5
Is there at least one case when whole region went down ;)) ?
replies(2): >>within+1v >>samspe+lV1
◧◩◪◨
18. within+1v[view] [source] [discussion] 2022-07-09 07:10:30
>>pojzon+Zt
At least once, in 2012? I remember because just two days before we had fully switched over to AWS and hadn’t done multi-region yet. We got our first downtime in 10 years and there was nothing we could do, unlike when we had the servers in a colo. We were at the mercy of Amazon. After that, we moved everything back to real physical servers.
◧◩
19. fulafe+ey[view] [source] [discussion] 2022-07-09 07:42:47
>>f0e4c2+X
Of those "correctly" architected apps, most are not properly tested for the failovers and won't actually work as architected (because of your own bugs or because aws failover stuff has bugs and you can't even test it).

Eg, falls over due to steep traffic spikes caused by outages when autoscaling mechanisms get previously unseen levels of load increases and enter some yoyo oscillation pattern, whole AZ is overloaded because all the failovers from the other failing AZ triggering at once, hit circuit breakers, spin up too slowly to ever pass health checks etc. Or can't detect something becoming glacially slow but not outright failing.

See eg https://www.theverge.com/2021/12/22/22849780/amazon-aws-is-d... & https://www.theverge.com/2020/11/25/21719396/amazon-web-serv... etc (many more examples are out there)

◧◩◪◨⬒
20. iasay+IA[view] [source] [discussion] 2022-07-09 08:09:15
>>namech+Mg
The problem with that is the only people you have to talk to is hams.

(Ex ham)

replies(1): >>nwh5jg+fJ
21. lazyli+PE[view] [source] 2022-07-09 09:04:25
>>dralle+(OP)
/r/sysadmin
◧◩◪◨⬒⬓
22. nwh5jg+fJ[view] [source] [discussion] 2022-07-09 10:02:21
>>iasay+IA
Well same for computers & the internet :)
replies(1): >>iasay+hK
◧◩◪◨⬒⬓⬔
23. iasay+hK[view] [source] [discussion] 2022-07-09 10:16:10
>>nwh5jg+fJ
Fair point!
◧◩◪
24. pmoria+nL[view] [source] [discussion] 2022-07-09 10:28:27
>>ignora+k5
The more regions a service is scattered over the greater the odds are that a single region outage somewhere in AWS will take down the whole service.

Consider the extreme case where your service is scattered over every AWS region: here an outage of any AWS region is guaranteed to take down your service.

Compare that to the case where your service is bound to only one region: then the odds of a single region outage taking down your entire service is reduced to 1 out of however many regions AWS has (assuming each region has an equal chance of suffering an outage).

To guard against outages, the failover service has to be scattered over entirely different regions (or, even better, on an entirely different service provider... which is probably a good idea anyway).

replies(1): >>ignora+w11
◧◩
25. pmoria+FM[view] [source] [discussion] 2022-07-09 10:42:52
>>banana+p1
> Even during the most extreme AWS events, my EC2 instances running dedicated servers kept seeing Internet traffic.

You were just lucky enough not to have been affected by AWS outages, but many others were.

You can get a lot of resilience to failure on AWS, but simply spinning up a dedicated EC2 instance is not nearly enough.

◧◩◪◨
26. ignora+w11[view] [source] [discussion] 2022-07-09 12:58:30
>>pmoria+nL
> The more regions a service is scattered over the greater the odds are that a single region outage somewhere in AWS will take down the whole service.

Agree. I think I should have suffixed a /s to my comment above.

> To guard against outages, the failover service has to be scattered over entirely different regions (or, even better, on an entirely different service provider... which is probably a good idea anyway).

Something, something... the greatest trick the devil (bigcloud) ever pulled...

◧◩◪◨
27. atmosx+nc1[view] [source] [discussion] 2022-07-09 14:18:22
>>mekste+Ed
If the AWS status wasn’t a static HTML page, I would agree.
◧◩◪◨
28. samspe+lV1[view] [source] [discussion] 2022-07-09 18:48:52
>>pojzon+Zt
Per the spreadsheet here https://awsmaniac.com/aws-outages/ :

There seem to have been multiple "full" outages in 2011-12 in AWS' us-east-1 region, which, granted, is the oldest AWS region and likely has a bunch of legacy stuff. By "full" outages I mean that a few core services fell over but the entire region become inaccessible due to those core failures.

replies(1): >>pojzon+NMc
◧◩◪◨⬒
29. pojzon+NMc[view] [source] [discussion] 2022-07-13 07:22:30
>>samspe+lV1
Its over 10 years ago tho. Are there any RECENT full region outages ?

Im forseeing a full downtime in Frankfurt this winter tho. Germany is in really bad position when it comes to electricity.

[go to top]