zlacker

The part of Postgres we hate the most: Multi-version concurrency control

submitted by andren+(OP) on 2023-04-26 17:10:38 | 257 points 143 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
◧◩
4. apavlo+rh[view] [source] [discussion] 2023-04-26 18:29:55
>>no_wiz+Uf
No. It would be a major surgery on the internals. See the article for my comment at the attempt to do this with the Zheap project:

https://wiki.postgresql.org/wiki/Zheap

◧◩
5. BohanO+Sh[view] [source] [discussion] 2023-04-26 18:31:56
>>no_wiz+Uf
I think there is a new project from the Postgres community. They try to replace the storage engine to solve the inefficiency caused by MVCC

https://github.com/orioledb/orioledb

9. Max-Ga+zm[view] [source] 2023-04-26 18:55:57
>>andren+(OP)
MVCC for Amazon Redshift;

(pdf) https://www.redshiftresearchproject.org/white_papers/downloa...

(html) https://www.redshiftresearchproject.org/white_papers/downloa...

I've been told, very kindly, by a couple of people that it's the best explanation they've ever seen. I'd like to get more eyes on it, to pick up any mistakes, and it might be useful in and of itself anyway to reader, as MVCC on Redshift is I believe the same as MVCC was on Postgres before snapshot isolation.

◧◩
13. anecdo+pp[view] [source] [discussion] 2023-04-26 19:10:45
>>Max-Ga+zm
Not bad but I like this one too

http://www.interdb.jp/pg/pgsql05.html

◧◩
18. gregw2+3y[view] [source] [discussion] 2023-04-26 19:54:08
>>Max-Ga+zm
This paper, at least by my skimming, seems to describe Redshift's historic SERIALIZABLE ISOLATION level, but does not mention Redshift's newer SNAPSHOT ISOLATION capability.

https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-re...

For concurrency scalability, AWS now configures SNAPSHOT ISOLATION by default if you use Redshift Serverless but non-serverless still defaults to SERIALIZABLE ISOLATION.

19. vp8989+Ry[view] [source] 2023-04-26 19:58:10
>>andren+(OP)
Am I correct in thinking that PG's MVCC implementation results in a worse story around offloading some mild OLAP workloads to a replica without affecting the primary? Anecdotally, it seems that MySQL handles this better but I don't understand the internals of both enough to explain why that is.

https://aws.amazon.com/blogs/database/manage-long-running-re...

◧◩
20. Lambda+Az[view] [source] [discussion] 2023-04-26 20:00:30
>>mmaund+uo
> in the 2010s, it was MongoDB because non-durable writes made it “webscale“

I think this is the best video on that topic: https://www.youtube.com/watch?v=b2F-DItXtZs

◧◩
27. fovc+uI[view] [source] [discussion] 2023-04-26 20:43:16
>>h1fra+DD
https://reorg.github.io/pg_repack/

Dead easy to run and no long-held locks

29. cybera+kL[view] [source] 2023-04-26 20:57:45
>>andren+(OP)
Yup. A lot of heavy users of Postgres eventually hit the same barrier. Here's another take from Uber: https://www.uber.com/blog/postgres-to-mysql-migration/

I had a similar personal experience. In my previous job we used Postgres to implement a task queuing system, and it created a major bottleneck, resulting in tons of concurrency failures and bloat.

And most dangerously, the system failed catastrophically under load. As the load increased, most transactions ended up in concurrent failures, so very little actual work got committed. This increased the amount of outstanding tasks, resulting in even higher rate of concurrent failures.

And this can happen suddenly, one moment the system behaves well, with tasks being processed at a good rate, and the next moment the queue blows up and nothing works.

I re-implemented this system using pessimistic locking, and it turned out to work much better. Even under very high load, the system could still make forward progress.

The downside was having to make sure that no deadlocks can happen.

◧◩
43. apavlo+A21[view] [source] [discussion] 2023-04-26 22:43:34
>>garycl+vj
> Of course at the end of the article they offer a solution - their product (and of course it’s AI enhanced)

We have been working on automatic database optimization using AI/ML for a decade at Carnegie Mellon University [1][2]. This is not a gimmick. Furthermore, as you can see from the many comments here, the problem is not overhyped.

[1] https://db.cs.cmu.edu/projects/ottertune/

[2] https://db.cs.cmu.edu/projects/noisepage/

◧◩◪◨
50. dang+Ba1[view] [source] [discussion] 2023-04-26 23:47:24
>>educti+K41
Perhaps I should explain. What I actually did was suggest that it would be in their interest to take out the bit that some readers were complaining about, because it felt like an ad at the end. Of course they were free not to follow my suggestion.

I admit that's not precisely how I described it in the GP comment but it never crossed my mind that anyone would care. Commenter objections never fail to surprise!

Edit: I think I was right that it was in their interest as well as all of ours, because earlier the thread was dominated by complaints like this:

https://news.ycombinator.com/item?id=35718321

https://news.ycombinator.com/item?id=35718172

... and after the change, it has been filling up with much more interesting on-topic comments. From my perspective that's a win-win-win, but YMMV.

◧◩
57. thinkx+Qc1[view] [source] [discussion] 2023-04-27 00:06:51
>>fdr+HV
Coming to Postgres, UNDO logs and no vacuum

https://github.com/orioledb/

◧◩◪
81. cipher+MA1[view] [source] [discussion] 2023-04-27 03:49:02
>>nextac+OY
It previously had this closing line, with links to their products (https://web.archive.org/web/20230426171217/https://ottertune...):

> A better approach is to use an AI-powered service automatically determine the best way to optimize PostgreSQL. This is what OtterTune does. We’ll cover more about what we can do in our next article. Or you can sign-up for a free trial and try it yourself.

That was removed after the article was posted to HN, at dang's suggestion - he posted about it elsewhere in these comments.

◧◩◪
82. nextac+SB1[view] [source] [discussion] 2023-04-27 04:02:20
>>apavlo+A21
Okay, so, Noisepage appears to be open source https://github.com/cmu-db/noisepage/

But I can't find the Ottertune Github page

Is any part of Ottertune open source?

◧◩◪◨
83. ncann+VC1[view] [source] [discussion] 2023-04-27 04:13:23
>>endorp+lr1
From what I can Google it seems to be the opposite of that, where they acknowledged Postgres's shortcoming in the mailing list:

https://www.reddit.com/r/programming/comments/4vms8x/why_we_...

https://www.postgresql.org/message-id/5797D5A1.5030009%40agl...

◧◩◪◨⬒⬓⬔
96. jabl+oO1[view] [source] [discussion] 2023-04-27 06:13:43
>>maskli+PK1
Also the paper linked from TFA goes into the various implementation options in more detail: https://db.cs.cmu.edu/papers/2017/p781-wu.pdf
◧◩◪
97. bberrr+FP1[view] [source] [discussion] 2023-04-27 06:23:59
>>dmak+hO1
Could it be this? https://www.youtube.com/playlist?list=PLSE8ODhjZXja7K1hjZ01U...
◧◩◪◨⬒
101. fathyb+s02[view] [source] [discussion] 2023-04-27 07:57:00
>>ncann+VC1
I didn't look at the reddit link but the full mailing list thread is more nuanced than that: https://www.postgresql.org/message-id/flat/579795DF.10502%40...
◧◩◪
107. barrke+K52[view] [source] [discussion] 2023-04-27 08:37:30
>>Diablo+hi1
That's not how I remember it.

> The Uber guy is right that InnoDB handles this better as long as you don't touch the primary key (primary key updates in InnoDB are really bad).

> This is a common problem case we don't have an answer for yet.

It's still not how I remember it.

Quote from https://www.postgresql.org/message-id/flat/579795DF.10502%40...

I still prefer Postgres by a long way as a developer experience, for the sophistication of the SQL you can write and the smarts in the optimizer. And I'd still pick MySQL for an app which expects to grow to huge quantities of data, because of the path to Vitesse.

◧◩◪
108. orthox+M62[view] [source] [discussion] 2023-04-27 08:44:21
>>Diablo+hi1
That's still the PostgreSQL problem: it has insane defaults.

https://www.postgresql.org/docs/current/runtime-config-resou... tells you what all the parameters do, but not why and how to change them.

"If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system." Why not set it to 25% of the memory in my system by default, then?

"Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. If this value is specified without units, it is taken as kilobytes. The default value is four megabytes (4MB)." Yes, and? Should I set it higher? When?

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serv... hasn't been updated for two years and explains only a handful of parameters.

"If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents." How much is a lot? Do I need to care if I'm running mostly OLTP queries?

"This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory." Okay, so I need to set it higher if I'm running OLAP queries. But how high is too high?

https://wiki.postgresql.org/wiki/Performance_Optimization is just a collection of blog posts written by random (probably smart) people that may or may not be outdated.

So when someone complains their Postgres instance runs like ass and smug Postgres weenies tell them to git gud at tuning, they should be less smug, because if your RDBMS requires extensive configuration to support nontrivial loads, you either make this configuration the default one or, if it's significantly different for different load profiles, put a whole section in the manual that covers day 1 and day 2 operations.

◧◩◪
119. zinclo+Mk3[view] [source] [discussion] 2023-04-27 15:43:17
>>avinas+tH1
I took a look at https://github.com/orioledb/orioledb which is a project attempting to remedy some of Postgres' shortcomings, including MVCC. It looks like they're doing something similar to MySQL with a redo log, as well as some other optimizations. So maybe this is the answer.
◧◩
124. pmontr+rT3[view] [source] [discussion] 2023-04-27 17:49:26
>>mmaund+uo
In the early days of the web MySQL was extremely faster than anything else because it was an ISAM file with no support for transactions. That was OK for many people that self-hosted the db on the not very powerful CPUs of the time.

I remember people stating that transactions are useless, and maybe they are for some workloads, see the success of MongoDB years later.

The transactional engine InnoDB was added in version 3.23 [1] in 2001 [2] .

[1] https://en.wikipedia.org/wiki/Comparison_of_MySQL_database_e...

[2] https://en.wikipedia.org/wiki/MySQL

◧◩◪◨
134. fdr+ug5[view] [source] [discussion] 2023-04-28 03:35:37
>>avinas+ma5
In most cases[1], when you update a tuple in Postgres, a new tuple is put somewhere else in the same heap, with different visibility information, "xmin", "xmax". The old tuple remains where it is. Index pointers to it likewise remain unchanged, but a new entry is added for the new tuple. The old version gains an updated "xmax" field indicating that version was deleted at a certain logical point.

Later on, VACUUM has to plow through everything and check the oldest running transaction to see whether the tuple can be "frozen" (old enough to be seen by every transaction, and not yet deleted) or the space reclaimed as usable (deleted and visible to nothing). Index tuples likewise must be pruned at this time.

In systems with an UNDO log, the tuple is mutated in place and the contents of the old version placed into a sequential structure. In the case where the transaction commits, and no existing concurrent repeatable read level transactions exist, the old version in the sequential structure can be freed, rather than forcing the system to fish around doing garbage collection at some later time to obsolete the data. This could be considered "optimized for commit" instead of the memorable "optimized for rollback."

On the read side, however, you need special code to fish around in UNDO (since the copy in the heap is uncommitted data at least momentarily) and ROLLBACK needs to apply the UNDO material back to the heap. Postgres gets to avoid all that, at the cost of VACUUM.

[1] The exception is "HOT" (heap only tuple) chains, which if you squint look a tiny bit UNDO-y. https://www.cybertec-postgresql.com/en/hot-updates-in-postgr...

◧◩◪
140. runlas+HZ7[view] [source] [discussion] 2023-04-28 22:07:17
>>dmak+hO1
https://youtu.be/oeYBdghaIjc

There are multiple years available for his first DB class but that’s the one I watched. I almost called it his ‘basic’ class but there’s literally only like one or two classes on SQL before he dives into all the various layers of the internals.

There’s also a few of his advanced courses. And then you’ll see guest lectures from industry on about every one of the new DB platforms you can think of.

They’re all under “CMU Database Group” on Youtube.

Highly recommend.

◧◩◪◨
142. lmwnsh+dWc[view] [source] [discussion] 2023-04-30 21:16:31
>>runlas+HZ7
You can do the projects (the autograder is public) and join a discord community of non-CMU people that are following along too! e.g., [0] for Fall 2022.

[0] https://15445.courses.cs.cmu.edu/fall2022/faq.html#q8

143. umairs+8oe[view] [source] 2023-05-01 13:07:41
>>andren+(OP)
Here is a rebuttal of many of the points raised by Uber against PostgreSQL: https://www.2ndquadrant.com/en/blog/thoughts-on-ubers-list-o...
[go to top]