zlacker

[parent] [thread] 8 comments
1. fizx+(OP)[view] [source] 2018-05-24 23:46:56
Bieber-related conversation might reasonably have been 3%. Updating his friend count was a constant source of lock contention on a particular mysql.

There were never explicitly dedicated databases/servers, unless you count the hotspot caused by whatever shard he hashed into.

replies(2): >>lwansb+H1 >>tzs+8h
2. lwansb+H1[view] [source] 2018-05-25 00:09:40
>>fizx+(OP)
You would think a HyperLogLog would be approximately good enough when follower counts reach tens of millions.
replies(1): >>fizx+F3
◧◩
3. fizx+F3[view] [source] [discussion] 2018-05-25 00:34:06
>>lwansb+H1
Hyperloglog was only just invented, and hadn't really made its way into the popular developer consciousness.

Anyhow, the solution is easier than that (implement lock striping, i.e. https://stackoverflow.com/questions/16151606/need-simple-exp...), but I was reorg'd away before my code hit production, so I'm not sure what happened there.

replies(1): >>lorenz+H6
◧◩◪
4. lorenz+H6[view] [source] [discussion] 2018-05-25 01:24:38
>>fizx+F3
Approximate counters are much much older than HyperLogLog. Here’s something from 1978: https://www.inf.ed.ac.uk/teaching/courses/exc/reading/morris.... Of course this solves a different problem because HyperLogLog is an, uh, interesting way to count followers (why would you need to do a count-distinct query? You can’t follow someone multiple times). In any case, the Flajolet Martin sketch dates back to 1985 and solves the same problem as HyperLogLog: https://en.wikipedia.org/wiki/Flajolet%E2%80%93Martin_algori...
replies(1): >>aflam+qj
5. tzs+8h[view] [source] 2018-05-25 04:16:14
>>fizx+(OP)
Did they actually need to know Bieber's friend count both accurately and in real time?

I would expect that for top Twitter accounts like his, you could get a very good approximation by logging events that increment or decrement the friend count, and using the rates those events are occurring and the last known exact friend count to forecast the current friend count. That would not be exact, but I bet you could get it close enough that users would not see anything off about it.

replies(2): >>mrgord+Hj >>segmon+Ca1
◧◩◪◨
6. aflam+qj[view] [source] [discussion] 2018-05-25 04:49:11
>>lorenz+H6
Here is nice presentation covering those: (link: https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf)
◧◩
7. mrgord+Hj[view] [source] [discussion] 2018-05-25 04:53:38
>>tzs+8h
Yes they made Storm for this
◧◩
8. segmon+Ca1[view] [source] [discussion] 2018-05-25 14:26:26
>>tzs+8h
Ha, I always joke that Twitter is a good example of why one should avoid distributed systems. Every often, you see it showing the wrong number of friends, tweets, likes, etc. I don't care, I suppose some people care about such metrics.
replies(1): >>scarfa+Ig1
◧◩◪
9. scarfa+Ig1[view] [source] [discussion] 2018-05-25 15:05:05
>>segmon+Ca1
That's the essence of the CAP theorem. You can have two of three - consistency, availability, and partition tolerance. Sometimes it makes sense to choose availability over consistency.
[go to top]