zlacker

[return to "The part of Postgres we hate the most: Multi-version concurrency control"]
1. audioh+1I[view] [source] 2023-04-26 20:41:27
>>andren+(OP)
My main takeaway from this article: as popular as Postgres and MySQL are, and understanding the legacy systems built for them, it will always require deep expertise and "black magic" to achieve enough performance and scale for hyper scale use cases. It justifies the (current) trend to have DB's built for distributed tx/writes/reads that you don't have to become a surgeon to scale. There are other DBs and DBaaS that, although not OSS, have solved this problem in a more cost-efficient way than having a team of surgeons.
◧◩
2. zie+RI[view] [source] 2023-04-26 20:45:05
>>audioh+1I
I would argue, you handle the hyper-scale use case when you are actually in hyper-scale. Trying to pre-maturely optimize this is almost always a waste of time and chances are you will screw it up anyway. Almost nobody gets to that scale anyway. If you do get to that scale, you have the money and resources to fix the problem(s) at that time.
◧◩◪
3. dalyon+E91[view] [source] 2023-04-26 23:39:25
>>zie+RI
i mean, sort of? There is some subtly lost in this oft-repeated advice. i've worked at 3 companies now that were initially based on a single RDBMS but have outgrown the scale of what is reasonable to serve off that architecture. They are consumer scale (10s of mill) users, but not hyperscale (IMHO 100m+). The amount of engineering cost to migrate a complicated growing company/product off a mono-db architecture is astounding. Conservatively i'm talking 10+ dev years of effort, at each company. Easily 10s of millions of $$$, maybe 100m+. None of them are "finished". It's really really time consuming and hard, once you have 100s of tables, 100s of thousands of lines of code, dozens of teams, etc.

I'm all about avoiding premature optimization, and its fine to start with a classic postgres. But please don't cling to that - if you see MVP success and you actually have a reasonable chance of getting to >1mill users (ie, a successful B2C product) please please dont wait to refactor your datastore to a more scalable solution. You will pay dearly if you wait too long. Absolutist advice serves noone well here - it really does depend on what your goals are as a company.

◧◩◪◨
4. zie+Lr1[view] [source] 2023-04-27 02:15:50
>>dalyon+E91
Of course subtlety matters, but as you start scaling and noticing pain points, that is when you start working towards fixing them. First you just throw hardware at the problem and that tends to scale really really well for a really long time. It's pretty rare, even at very large scale that you MUST move off of PG, there are plenty of well tested scaling solutions, if you have the $$$'s to spend.

10+ years of dev work for a few hundred tables worries me a lot. My last conversion was about 20 years of data across a few hundred tables and we did two-way data synchronization across DB products with about 1 month of work, with 2 devs. We kept the sync running for over a year in production because we didn't want to force users over to the new system in a big hurry. We only stopped because the license on the old DB product finally expired and nobody wanted to pay for it anymore.

◧◩◪◨⬒
5. dalyon+KL1[view] [source] 2023-04-27 05:50:16
>>zie+Lr1
Yes again the common refrains - just throw hardware at it. I/we of course know this and all the systems I’m referring to did that first until they couldn’t. But you’re kind of missing my point - im saying by the time you are noticing scale pain points it’s often too late. Too late insofar as your system has likely grown so much in breadth (complexity, features, subsystems, lines of code, services, etc) that all depend on this one db. All this vast amount of stuff all written assuming all tables are accessible to everyone. It becomes a tangled web of data access patterns / tables that is very hard to break apart.

Nevermind the other aspect the pat advice doesn’t mention - managing a massive single RDMS is a goddamn nightmare. At a very large scale they are fragile, temperamental beasts. Backups, restores, upgrades all become hard. Migrations become a dark art , often taking down the db despite your best understanding. Errant queries stalling the whole server, tiny subtleties in index semantics doing the same. Yes it’s all solvable with a lot of skill, but it ain’t a free lunch that’s for sure. And tends to become a HUGE drag on innovation, as any change to the db becomes risky.

To your other point yes, replicating data “like for like” into another RDBMS can be cheap. But in my experience this domain data extraction is often taken as an opportunity to move it onto a non RDBMS data store that gives you specific advantages that match that domain, so you don’t have scaling problems again. That takes significantly longer. But yes I am perhaps unfairly including all the domain separation and “datastore flavor change” work in those numbers

◧◩◪◨⬒⬓
6. zie+E53[view] [source] 2023-04-27 14:38:57
>>dalyon+KL1
I think we are basically in agreement about everything, but coming from different perspectives. There is no "right" answer, but pre-mature optimization is almost always the wrong answer.
[go to top]