zlacker

One of the weird things about Postgres MVCC is that it is "optimized for rollback," as one person memorably quipped to me. This is not to imply a design principle, it's more a description of how things ended up, and the general argument behind this quip is Postgres lacks "UNDO" segments.

On the one hand, this does make the model Postgres uses admirably simple: the WAL is all "REDO," and the heap is all you need to accomplish any kind of read, but at the expense that stuff that normally would be copied off to a sequential UNDO log and then vaporized when the transaction commits and all possible readers have exited remains comingled with everything else in the main database heap, needing to be fished out again by VACUUM for purging and figuring out how to reclaim numerical space for more transactions.

There may be other solutions to this, but it's one unusual quality Postgres has relative to other MVCC databases, many of which sport an UNDO log.

There are downsides to UNDO, however: if a read needs an old copy of the tuple, it needs to fish around in UNDO, all the indices and synchronization need to account for this, and if there's a rollback or crash recovery event (i.e. mass-rollback of all transactions open at the time), everything has to be shuffled back into the main database storage. Hence the memorable initial comment: "Postgres is optimized for rollback."

replies(1): >>thinkx+9h

>>fdr+(OP)
Coming to Postgres, UNDO logs and no vacuum

https://github.com/orioledb/

replies(1): >>avinas+Fe4

>>thinkx+9h
what is an UNDO log and how does it solve the problem?

replies(1): >>fdr+Nk4

>>avinas+Fe4
In most cases[1], when you update a tuple in Postgres, a new tuple is put somewhere else in the same heap, with different visibility information, "xmin", "xmax". The old tuple remains where it is. Index pointers to it likewise remain unchanged, but a new entry is added for the new tuple. The old version gains an updated "xmax" field indicating that version was deleted at a certain logical point.

Later on, VACUUM has to plow through everything and check the oldest running transaction to see whether the tuple can be "frozen" (old enough to be seen by every transaction, and not yet deleted) or the space reclaimed as usable (deleted and visible to nothing). Index tuples likewise must be pruned at this time.

In systems with an UNDO log, the tuple is mutated in place and the contents of the old version placed into a sequential structure. In the case where the transaction commits, and no existing concurrent repeatable read level transactions exist, the old version in the sequential structure can be freed, rather than forcing the system to fish around doing garbage collection at some later time to obsolete the data. This could be considered "optimized for commit" instead of the memorable "optimized for rollback."

On the read side, however, you need special code to fish around in UNDO (since the copy in the heap is uncommitted data at least momentarily) and ROLLBACK needs to apply the UNDO material back to the heap. Postgres gets to avoid all that, at the cost of VACUUM.

[1] The exception is "HOT" (heap only tuple) chains, which if you squint look a tiny bit UNDO-y. https://www.cybertec-postgresql.com/en/hot-updates-in-postgr...