Transactionally Staged Job Drains in Postgres

>>johns+(OP)
I think it is great that PostgreSQL is strong enough to allow people to build robust queuing systems, but I still think that you are better off in the long run to use a real message queuing system like RabbitMQ to do this job.

Start out by running RabbitMQ on the same server as PostgreSQL but do limit its use of cores and RAM. Then when your business grows you can easily scale to a separate RabbitMQ server, to a cluster of MQ servers and to a distributed RabbitMQ service using clusters in multiple data centers with global queues synchronized using a RabbitMQ plugin.

The benefit of using RabbitMQ is that you begin to learn how message queuing fits into a system architecture and that you will not run into corner cases and weird behaviors as long as you heed the advice of moving to a dedicated RabbitMQ server when your usage gets large enough.

An additionally benefit is that when you learn how to integrate functionality by using a message queue (actor model) rather than a link editor, you can avoid the monolithic big ball of mud problem entirely and easily integrate both monolithic functions and microservices in your app.

Background jobs are just one part of what a robust message queue gives you. In my opinion, the desire for background jobs is a design smell that indicates a flaw in your architecture which you can fix by adding a message queue system.

>>memrac+uI
As much as I like queues, RabbitMQ has some downsides compared to a database.

First, you get zero visibility into what's in the queue. There's literally no way to peek inside a queue without taking messages from it. Let's say one of the fields of your messages is customer_id. There's no way to get a count of how many messages are waiting that are related to customer 123.

This leads to the next problem: If the customer_key is something you want to partition by, you could create one queue per customer and then use a routing key to route the messages. But Rabbit queues are very rigid, as opposed to fluid. It's pretty inconvenient to move stuff between queues. So if you have one queue, and you want to split it into N queues, the only way is to drain the queue and republish each message back to the exchange. Rabbit provides no command line or management tools to do this, and neither does anyone else that I know.

Lastly, Rabbit deletes acked messages. To get any visibility into the history of your processing -- or indeed play back old messages -- you have to build that into your topology/apps, e.g. by having an exchange that dupes all messages into a queue and then run a consumer that drains it into a database table or log file.

I much like the "log" approach to queueing, as popularized by Apache Kafka. However, Kafka has its issues, and sometimes a database table is better.

The pattern I rather like to use is to use Rabbit purely for queue orchestration. Make a task table, use NOTIFY to signal that a row has been added (with ID as payload), have a worker use LISTEN and stuff each task's ID into Rabbit. Then have consumers get the Rabbit message, read (and lock!) the corresponding task, perform the task, then mark the task as done. If you need to replay or retry failed tasks, just use SQL to emit NOTIFYs again.

zlacker