zlacker

[parent] [thread] 3 comments
1. timbra+(OP)[view] [source] 2019-05-27 15:48:45
Um, I’ve been at AWS since late 2014, and AFAIK the only extended SQS hiccup correlated with the DynamoDB issue in 2016. SQS isn’t perfect but I’m pretty sure “does routinely have temporary failures that generally last for a few ours at a time” is just wrong.
replies(1): >>noelhe+L
2. noelhe+L[view] [source] 2019-05-27 15:54:40
>>timbra+(OP)
I believe GP was talking about particular messages failing, not a total system outage. In my use of AWS, the status page almost never reports an outage even though that AWS service is down for me-as in the most I've ever seen is some hand wavey message that there's elevated error rates. So you could be right, SQS hasn't failed entirely, but that probably means there's a good number of failed requests that are below the margin where AWS would consider it down.
replies(2): >>dantil+V3 >>panark+1L
◧◩
3. dantil+V3[view] [source] [discussion] 2019-05-27 16:18:45
>>noelhe+L
Yes, this is correct, thank you. I updated my comment to indicate that I meant partial failure, though the failure conditions persist from 20 minutes to a few hours. Those partial failures have happened once every two months or so in my experience.

Technically, it's not even really a failure of SQS because the guarantees SQS makes are so weak that those partial failures are really "operating normally."

◧◩
4. panark+1L[view] [source] [discussion] 2019-05-27 22:57:58
>>noelhe+L
I've seen this behavior occasionally, too.

Producer system puts 500 messages on the queue. Consumer system can't see anything for 90 minutes. Then mysteriously the messages show up.

The status page stays green, not even a note about elevated error rates.

[go to top]