zlacker

Interesting.

I'm working on delivering a Postgres based job system right now; we cycle through states from an ENUM, landing eventually on a terminal state. Worker jobs (containers on a cluster) don't directly manipulate the state of the table, there's a controller system for that. Each controller in the (3-node) cluster has 2 connections to Postgres. Old jobs are DELETE'd when it's been "long ago enough".

Prior to addressing deadlocks from doing too much per transaction, initial load testing for this system suggested that the database was not the bounding factor in the system throughput, but rather worker throughput. Initial load is estimated to be under 500/day (\yawn\), but pushing the load to 100K/day didn't alter the outcome, although it made the cluster admin mildly annoyed.

One key reason I prefer to have the state machine switching / enum approach is that it's logically obvious. At a certain point, I am sure it'd have to change. I'm not sure how many concurrent mutations to separate rows a Postgres table can tolerate, but that serves as a hard upper bound.

Author: what kind of volume do you tolerate with this kind of design?