zlacker

[return to "Twitter Is DDOSing Itself"]
1. brigad+Iv[view] [source] 2023-07-01 21:04:17
>>ZacnyL+(OP)
This is why you always use exponential backoff.
◧◩
2. fathyb+8z[view] [source] 2023-07-01 21:27:15
>>brigad+Iv
And when you're at Twitter scale, sprinkle some jitter too.
◧◩◪
3. oblio+UK[view] [source] 2023-07-01 22:49:15
>>fathyb+8z
What do you mean?
◧◩◪◨
4. wolfga+PL[view] [source] 2023-07-01 22:58:36
>>oblio+UK
Say you have a bug that caused 100,000 HTTP requests to hang, and you kick the node and make them all fail at once. One second later, 100,000 clients suddenly retry simultaneously, causing a huge spike in load which makes most of their requests fail. They use exponential backoff, so two seconds after that, 99,000 clients retry, causing a huge spike in load that makes most of their requests fail. Four seconds after that, 98,000 clients retry...

If you introduce a bit of randomness into the retry timing (say, multiply by 1.8~2.2 instead of a straight doubling), that thundering herd will spread itself out and be much easier to recover from.

[go to top]