zlacker

So SKIP LOCKED is a pretty well worn optimization at this point. People have been doing this for a while.

What kind of TPS are people seeing on queues based on psql?

edit: https://gist.github.com/chanks/7585810

10k/s here, but that was on a postgres from years ago - there have been like 4 or 5 major versions since then I think.

That's a good amount and I'm betting you can push it forward. Further, a queue is trivially sharded since messages are entirely isolated.

That said, Kafka can do hundreds of thousands if not millions of messages per second, so clearly there's room for optimizations.

replies(1): >>ericho+qa

>>static+(OP)
You are correct that things have improved further.

You can now efficiently partition your job queue, just like you would do with Kafka to get higher scalability. You then "prepare" your "dequeue" query in Postgres and the planner will only look at the relevant partition, pruning all of the others. It's like having one logical queue to insert into and hundreds of actual physical queues, transparently partitioned, to pull from. You then assign your workers to specific partitions and plow through your jobs.

In PG 14, you can reasonably have a thousand partitions on a single job queue, each virtually independent performance-wise. As a bonus, you can have two-level partitions gated on the task's timestamp. Older partitions/tasks can then be detached and dropped without using DELETE, which makes it a fast operation in Postgres. Any index or table bloat from the older partitions disappears immediately. Pretty sweet.

Obviously, this takes more work to set up and there's ongoing operational stuff (cron job), but you retain all of the transactional guarantees of Postgres in the process and the performance is quite good. I like the overall operational simplicity of having everything in a single, inexpensive and rock-solid RDBMS. I like getting transaction-safe jobs "for free."

It's so cheap, too. People on this thread talk about using SQS. I spend ~$1600/month on SQS to process 20M messages per day, going on three years now. I can do far more than that on our Postgres instance, a $5K machine bought two years ago, sitting in our co-lo in downtown LA with a cheap 10Gb Cogent connection that also runs the rest of the business ($2300/month for the entire rack of machines + Internet).

But I'm forced to pay for that damn SQS queue because that's how a business partner gets us data. Such a waste of money for something so cheap and easy to do reliably with Postgres. I've now spent over $50K on something I can do more or less for free on a 2012 Dell server. Such is business.