https://gizmodo.com/5632095/justin-bieber-has-dedicated-serv...
Scaling a social network is just inherently a very hard problem. Especially if you have a large userbase with a few very popular users. Stackshare recently did a nice blogpost about how we at Stream solve this for 300 million users with Go, RocksDB and Raft: https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...
I think the most important part is using a combination of push and pull. So you keep the most popular users in memory and for the other users you use the traditional fanout on-write approach. The other thing that helped us scale was using Go+RocksDB. The throughput is just so much higher compared to traditional databases like Cassandra.
It's also interesting to note how other companies solved it. Instagram used a fanout on write approach with Redis, later on Cassandra and eventually a flavor of Cassandra based on RocksDB. They managed to use a full fanout approach using a combination of great optimization, a relatively lower posting volume (compared to Twitter at least) and a ton of VC money.
Friendster and Hyves are two stories of companies that didn't really manage to solve this and went out of business. (there were probably other factors as well, but still.) I also heard one investor mention how Tumblr struggled with technical debt related to their feed. A more recent example is Vero that basically collapsed under scaling issues.
Other components such as the search were probably also quite tricky. One thing i've never figured out is how Facebook handles search on your timeline. That a seriously complex problem.
Linkedin Recently published a bunch of papers about their feed tech:
https://engineering.linkedin.com/blog/2016/03/followfeed--li...
And Stream's stackshare is also interesting: https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...
Anyhow, the solution is easier than that (implement lock striping, i.e. https://stackoverflow.com/questions/16151606/need-simple-exp...), but I was reorg'd away before my code hit production, so I'm not sure what happened there.
[1] https://www.wired.com/2015/09/whatsapp-serves-900-million-us...
https://www.theatlantic.com/technology/archive/2015/01/the-s...
Here is a quick excerpt, this book is filled to the brim with these gems.
> The final twist of the Twitter anecdote: now that approach 2 is robustly implemented,Twitter is moving to a hybrid of both approaches. Most users’ tweets continue to be fanned out to home timelines at the time when they are posted, but a small number of users with a very large number of followers (i.e., celebrities) are excepted from this fan-out. Tweets from any celebrities that a user may follow are fetched separately and merged with that user’s home timeline when it is read, like in approach 1. This hybrid approach is able to deliver consistently good performance.
Approach 1 is a global collection of tweets, the tweets are discovered and merged in that order.
Approach 2 involves posting a tweet from each user into each follower's timeline, with a cache similar to how a mailbox would work.
[1] https://www.amazon.com/Designing-Data-Intensive-Applications...
When I was working on something with similar technical requirements I also came across this paper (http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf) that outlined the approach in a more 'formal' manner.
Scaling is fundamentally about the ability of a system to easily support many servers. So something is scalable if you can easily start with one server and go easily to 100, 1000, or 10,000 servers and get performance improvement commensurate with the increase in resources.
When people talk about languages scaling, this is silly, because it is really the architecture that determines the scalability. One language may be slower than another, but this will not affect the ability of the system to add more servers.
Typically one language could be two or three, or even ten times slower. But all this would mean in a highly scalable system is that you would need two or three or ten times the number of servers to handle a given load. Servers aren't free (just ask Facebook), but a well-capitalized company can certainly afford them.
http://www.businessinsider.com/2008/5/why-can-t-twitter-scal...
Purely in terms of App in think the largest would be Cookpad. AirBnB never shared their numbers so I don't know.
https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-...
IT was thrown when the system was failing at the rails layer rather than the apache layer. I believe that Ryan King and I were the last people to ever see the moan cone in production. =)
I am now CEO @ https://fauna.com/, making whales and fails a thing of the past for everybody. We are hiring, if you want to work on distributed databases.
Twitter was a realtime messaging platform (fundamentally not edge-cacheable) that evolved from that CMS foundation. So the reason for the difficult evolution should be clear.
It's not really a coincidence that before Twitter I worked at CNET on Urbanbaby.com which was also a realtime, threaded, short message web chat implemented in Rails.
Anyway the point is: use my new project/company https://fauna.com/ to scale your operational data. :-)
Pull down the code some time. Everything and the kitchen sink is in there somewhere. It's a crazy project.