zlacker

I think it is great that PostgreSQL is strong enough to allow people to build robust queuing systems, but I still think that you are better off in the long run to use a real message queuing system like RabbitMQ to do this job.

Start out by running RabbitMQ on the same server as PostgreSQL but do limit its use of cores and RAM. Then when your business grows you can easily scale to a separate RabbitMQ server, to a cluster of MQ servers and to a distributed RabbitMQ service using clusters in multiple data centers with global queues synchronized using a RabbitMQ plugin.

The benefit of using RabbitMQ is that you begin to learn how message queuing fits into a system architecture and that you will not run into corner cases and weird behaviors as long as you heed the advice of moving to a dedicated RabbitMQ server when your usage gets large enough.

An additionally benefit is that when you learn how to integrate functionality by using a message queue (actor model) rather than a link editor, you can avoid the monolithic big ball of mud problem entirely and easily integrate both monolithic functions and microservices in your app.

Background jobs are just one part of what a robust message queue gives you. In my opinion, the desire for background jobs is a design smell that indicates a flaw in your architecture which you can fix by adding a message queue system.

replies(5): >>brandu+e2 >>borplk+73 >>draagl+q3 >>netcra+bb >>atombe+5G

>>memrac+(OP)
I appreciate the writeup!

One thing that I probably should have been clearer on: I used Sidekiq as a queue example here, but this pattern generalizes to anything. RabbitMQ is just as plausible.

> Background jobs are just one part of what a robust message queue gives you. In my opinion, the desire for background jobs is a design smell that indicates a flaw in your architecture which you can fix by adding a message queue system.

Possibly ... one thing to consider is that a fair number of us are writing various types of web services, and in web services there are so many obvious tasks that should be moved out of band to a background job. It's not even so much about distributing workload (although that's a nice aspect) as it is about moving expensive operations out-of-band so that a user's request finishes faster.

Here's a couple examples:

* Process an uploaded image into a variety of thumbnail sizes.

* Fire a webhook.

* Send a welcome email.

* Duplicate added/updated information to an internal microservice.

* Start a file download (like synchronizing a remote RSS feed).

In all these cases there's no reason for the request that initiated the action to wait on the job being finished. Moving the work to the background is purely a win from the user's perspective.

>>memrac+(OP)
A queue is good for relieving the burden of processing the correct items from the list (so you don't have to query the database to know what to process next).

Apart from that would say that you should still have a serious ACID DB like Postgres for state management.

Then push the unique identifier of the task to the queue, and let the actual data live on the database.

Queues are good for being queues, not for being data stores.

I have seen some horrific cases of people mistaking a queue software for a database.

>>memrac+(OP)
Bonus: you can use pg_amqp[1] to publish to RabbitMQ with transactional semantics

1: https://github.com/omniti-labs/pg_amqp

>>memrac+(OP)
I tend to agree with you but I haven't found any good way to put things in a queue from within postgres - from a trigger for instance. Doing so would open up a lot of possibilities - do you have any suggestions, even if its just for things to google?

replies(1): >>Deimor+6e

>>netcra+bb
I haven't personally used either of these, but they look interesting and I'm hoping to test them out at some point:

https://github.com/gmr/pgsql-listen-exchange

https://github.com/subzerocloud/pg-amqp-bridge

In theory, these let you use postgres NOTIFY to add messages to queues (which can be done from inside triggers).

>>memrac+(OP)
As much as I like queues, RabbitMQ has some downsides compared to a database.

First, you get zero visibility into what's in the queue. There's literally no way to peek inside a queue without taking messages from it. Let's say one of the fields of your messages is customer_id. There's no way to get a count of how many messages are waiting that are related to customer 123.

This leads to the next problem: If the customer_key is something you want to partition by, you could create one queue per customer and then use a routing key to route the messages. But Rabbit queues are very rigid, as opposed to fluid. It's pretty inconvenient to move stuff between queues. So if you have one queue, and you want to split it into N queues, the only way is to drain the queue and republish each message back to the exchange. Rabbit provides no command line or management tools to do this, and neither does anyone else that I know.

Lastly, Rabbit deletes acked messages. To get any visibility into the history of your processing -- or indeed play back old messages -- you have to build that into your topology/apps, e.g. by having an exchange that dupes all messages into a queue and then run a consumer that drains it into a database table or log file.

I much like the "log" approach to queueing, as popularized by Apache Kafka. However, Kafka has its issues, and sometimes a database table is better.

The pattern I rather like to use is to use Rabbit purely for queue orchestration. Make a task table, use NOTIFY to signal that a row has been added (with ID as payload), have a worker use LISTEN and stuff each task's ID into Rabbit. Then have consumers get the Rabbit message, read (and lock!) the corresponding task, perform the task, then mark the task as done. If you need to replay or retry failed tasks, just use SQL to emit NOTIFYs again.