zlacker

This is fantastic for relatively low volume queues with intensive work to be done by a small number of workers. For these types of use cases, I'll take this approach over RabbitMQ or Kafka all day long.

But once you get even just up to say, 40 messages per second with 100 worker processes, you're now up to 4000 updates per second just to see which worker got to claim which job, and up from there becomes untenable.

replies(3): >>ramraj+A1 >>preeta+ch >>shoo+Mj

>>montro+(OP)
Why jump from Postgres to Kafka? Celery + rabbitmq or redis is a great middle ground.

replies(1): >>Spivak+3c

>>ramraj+A1
And RabbitMQ is a solid middle ground that you can scale like crazy.

>>montro+(OP)
> This is fantastic for relatively low volume queues with intensive work to be done by a small number of workers. For these types of use cases, I'll take this approach over RabbitMQ or Kafka all day long.

That’s what our workload is like for our SaaS code analysis platform. We create a few tasks (~10 max) for every customer submission (usually triggered by a code push). We replaced Kafka with a PostgreSQL table a couple of years ago.

We made the schema, functions, and Grafana dashboard open source [0]. I think it’s slightly out-of-date but mostly the same as what we have now in production, and has been running perfectly.

[0] https://github.com/ShiftLeftSecurity/sql-task-queue

>>montro+(OP)
a few years back i worked on an enterprisey project that used postgres as a database, along with rabbitmq and celery for job processing.

of _course_ the system architecture had to have a job queue and it had to be highly available (implemented with a rabbitmq cluster)

what we learned after a few months in production was the only time the rabbitmq cluster had outages was when it got confused* and thought (incorrectly) there was a network partition, and flipped into partition recovery mode, causing a partial outage until production support could manually recover the cluster

the funny thing about this is that our job throughput was incredibly low, and we would have had better availability if we had avoided adding the temperamental rabbitmq cluster and instead implemented the job queue in our non-HA postgres instance that was already in the design --- if our postgres server went down then the whole app was stuffed anyway!

* this was a rabbitmq defect when TLS encryption was enabled and very large messages were jammed through the queue -- rabbitmq would be so busy encrypting / decrypting large messages that it'd forget to heartbeat, then it'd timeout and panic that it hadn't got any heartbeats, then assume that no heartbeats implied a network partition, and cause an outage, needing manual recovery. i think rabbitmq fixed that a few years back