We implemented a similar design to Que for a specific use case in our application that has a known low volume of jobs and for a variety of reasons benefits from this design over other solutions.
[1]: https://github.com/que-rb/que [2]: https://brandur.org/postgres-queues
0. https://www.cloudamqp.com/blog/why-is-a-database-not-the-rig...
EDIT>> I am not suggesting people build their own rabbitmq infrastructure. Use a cloud service. The article is informational only.
The below article goes over some downsides to using postgres for a job queue: namely that the data model isn't optimized to find and send new jobs to large numbers of subscribers. Actual messaging systems like Kafka/NATS are better at this.
https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...
Also, there are things that the dedicated messaging systems give you that postgres won't, such as transparent error handling and retry logic. If your worker grabs a job from postgres and fails (in some kind of transient way such that the task should be retried), you'll have to implement the retry logic yourself. A streaming platform like Kafka or NATS will notice that you haven't acknowledged the message and deliver it to someone else.
This is something you could pretty easily implement yourself with something like a process that just scans over the "processing" jobs and looks for timeouts, resetting them as it goes. But there are probably a few of these little papercuts that Kafka-like (…Kafkaesque?) systems would handle for you.
So, I guess if you already have postgres set up in a reliable way, and there's no one around whose job it is to set up new production systems like Kafka, and you don't already have something like Kafka, and you only need basic job queue requirements or you're okay with implementing whatever you need on top of postgres yourself, and your incoming job rate is fewer than maybe a thousand per second, and you have fewer than maybe tens of consumers… postgres is probably a decent job queue.
The above paragraph seems snide but it probably does describe many people's environments.
For high throughput (we had ad tech servers with 1E7 hits/s) we used a home-built low-latency queue that supported real time and persisted data. But for low throughput stuff, the DBaaQ worked fine.
And ultimately, maybe it was a lack of imagination on our part since Segment was successful with a mid-throughput DBaaQ https://segment.com/blog/introducing-centrifuge/
- The service mentioned (now called https://webapp.io ) eventually made it into YC (S20) and still uses postgres as its pub/sub implementation, doing hundreds of thousands of messages per day. The postgres instance now runs on 32 cores and 128gb of memory and has scaled well.
- We bolstered Postgres's PUBLISH with Redis pub/sub for high traffic code paths, but it's been nice having ACID guarantees as the default for less popular paths (e.g., webhook handling)
- This pattern only ever caused one operational incident, where a transaction held a lock which caused the notification queue to start growing, and eventually (silently) stop sending messages, starting postgres with statement_timeout=(a few days) was enough to solve this
Previous discussion: https://news.ycombinator.com/item?id=21484215
Happy to answer any questions!
'easy to learn and use' is a downright lie.
edit: link to this same topic being discussed a few weeks ago: https://news.ycombinator.com/item?id=28903614#28904103
One could also use DNS TXT as an RDBMS with some interesting fault tolerance and distribution schemes. That doesn't mean it's a good idea or the best way to solve a problem.
If you haven't seen them already, the Jepsen analyses are really worth a read: https://aphyr.com/posts/293-jepsen-kafka https://aphyr.com/posts/282-jepsen-postgres https://aphyr.com/tags/jepsen
One of the biggest advantages comes when you start thinking about them in terms of transactions. Transactional guarantees are really useful here: guarantee that a message will be written to the queue if the transaction commits successfully, and guarantee that a message will NOT be written to the queue otherwise.
https://brandur.org/job-drain describes a great pattern for achieving that using PostgreSQL transactions.
I actually wrote StarQueue for MySQL, Microsoft SQL server and also Postgres - they all work just fine as a message queue back end, because they all support SKIP LOCKED.
I started out including Oracle as a supported database but found that I loathed Oracle's complexity.
What kind of TPS are people seeing on queues based on psql?
edit: https://gist.github.com/chanks/7585810
10k/s here, but that was on a postgres from years ago - there have been like 4 or 5 major versions since then I think.
That's a good amount and I'm betting you can push it forward. Further, a queue is trivially sharded since messages are entirely isolated.
That said, Kafka can do hundreds of thousands if not millions of messages per second, so clearly there's room for optimizations.
That’s what our workload is like for our SaaS code analysis platform. We create a few tasks (~10 max) for every customer submission (usually triggered by a code push). We replaced Kafka with a PostgreSQL table a couple of years ago.
We made the schema, functions, and Grafana dashboard open source [0]. I think it’s slightly out-of-date but mostly the same as what we have now in production, and has been running perfectly.
> Listen to changes in a PostgreSQL Database and broadcasts them over WebSockets
[1]: https://github.com/supabase/realtime
Disclosure: I'm a Supabase employee.
> With the libpq library, the application issues LISTEN as an ordinary SQL command, and then must periodically call the function PQnotifies to find out whether any notification events have been received.
[1]: https://www.postgresql.org/docs/current/sql-listen.html
> graphile-worker is horizontally scalable. Each instance has a customisable worker pool, this pool defaults to size 1 (only one job at a time on this worker) but depending on the nature of your tasks (i.e. assuming they're not compute-heavy) you will likely want to set this higher to benefit from Node.js' concurrency.
It's really simple. I used this snippet for a real-time application which can react to any kind of change in the database instantly.
And Elixir is especially well suited for this type of workload. I actually extracted out much of the Supabase Realtime library so that I could work with the data directly in Elixir[2]
[1]: https://github.com/supabase/realtime#why-not-just-use-postgr...
[1]: https://dramatiq.io/
That means you can't have docker and different versions of Java, node and .Net all running in parallel.
You run a single process and Sqlite is a library that allows SQL operations and database to be inbuilt. You 'budget' is like 100 mb of Ram, becauae other stuff has to run too.
All the time-series databases I know are a large, memory hungry hippo, built for distributed compute/kubernetes. Just very different usecase. If one was built with minimalism in mind, then it could be used.
The benchmark shows that the time to send notifications grows
quadratically with the number of notifications per transaction without
the patch while it grows linearly with the patch applied and
notification collapsing disabled.