zlacker

Oh please stop using databases as queues. I spent a disproportionate amount of time in the last 20 years undoing that decision. It doesn’t scale at all well.

replies(15): >>ngrill+u >>aidos+b6 >>skytre+f7 >>mianos+S8 >>cerved+fb >>basilg+lc >>blowsk+0f >>baq+To >>atombe+Zo >>vincen+cq >>crazyg+2I >>onlyre+iJ >>maxrev+gO >>sontho+Z21 >>holodu+0Q1

>>hughrr+(OP)
Look at SKIP LOCKED in PostgreSQL and MySQL.

replies(1): >>hughrr+e1

>>ngrill+u
It’s not that. It usually ends up in the same storage engine as the main business functionality which leads to IO contention. The some sick and twisted bastard does a join to it from somewhere abhorrent which means it requires some serious heavy lifting to fix it.

This is unfortunately the status quo in the slightly less ideal universe than “seemed like a good idea at the time”

Use SQS or RabbitMQ or something.

Edit: Also if you’re using something SQL for queues you’re going to have to build out your monitoring stack yourself based on your implementation rather than suck up a metrics endpoint with queue stats or pull from cloudwatch.

replies(3): >>jbvers+d2 >>glogla+s6 >>flefto+l7

>>hughrr+e1
I dunno. If it’s usually the same problem, it should be easy to identify and fix.

Unless we’re talking about a high load, there should be no problem doing this.

replies(1): >>hughrr+h3

>>jbvers+d2
It’s only the same problem until someone dumps 25,000 entries in your queue.

replies(2): >>kasey_+Sb >>crazyg+NF

>>hughrr+(OP)
The solution I use now is to have the queue in the dB and then have a single process pushing to another queue better suited to dealing with workers.

I drank from the “keep your queue out of the dB” koolaid once, but then you end up worrying about transaction boundaries and atomic writes to the queue and the dB. It totally depends on your workload, but in my case, I’ve found the halfway solution of the queue in the dB so you get to write to it within your transactions, with a dedicated queue for the workers to be a much better solution.

replies(2): >>hughrr+Q8 >>AtNigh+ne

>>hughrr+e1
The benefit of having the queue in the same engine as the main business functionality is that you can do stuff in a single transaction.

Having the queue on its own means you have to handle exceptions like "job is done but queue didn't get updated" or get into distributed transaction which is very different challenge.

Also, operationally having one inherently stateful component (db) versus two (db, queue) makes things a lot simpler to manage, deploy, update, etc.

Not claiming this is a perfect solution, just that the tradeoff isn't obvious.

replies(1): >>hughrr+59

>>hughrr+(OP)
This please. I feel like "How to Get Away with Just PostgreSQL" and the GP comment falls squarely under being too preoccupied with whether you could, you didn't stop to think if you should.

Whatever happened to use the proper data structures for the job? PostgreSQL and MySQL are, at the end of the day, b-trees with indices. Throw in relational properties and/or ACID too. Those aren't properties you need or want in a queue structure.

I know I don't have a solid argument against not doing it; it's just experience (and dare I say, common sense) telling me not to. Not quite like parent but I spent the first two years of my professional career in a team that had the brilliant idea to use DBs as queues. The big task I partook in for that stint is moving them off that v2 into a v3 which used---wait for it---Redis. Everyone's quality of life improved with every migration, proportional to the size of the v2 cluster we retired.

replies(4): >>zigzag+rb >>lolind+0t >>Seattl+4J >>e12e+IE1

>>hughrr+e1
So use a separate database.

Is contention with the application your only objection? That’s pretty weak.

What’s the real compelling reason not to use a database? You haven’t said.

replies(1): >>hughrr+J8

>>flefto+l7
Contention, architectural separation, you have to build your own monitoring stack, not transactional with concerns outside the database without introducing distributed transactions and risk, no routing or distribution capability, you have to build it yourself, point in time message durability is somewhat dubious depending on your acknowledgement process which of course you had to invent yourself as well.

Not to mention numerous database administrators crying into their cups of coffee.

Enough?

Like I said 20 years of people fucking this up is my experience.

replies(2): >>flefto+w9 >>deskam+Kh

>>aidos+b6
Are you sure it works properly when something fails. That’s a wonderful situation I like to ask people because the answer is usually no. Then someone gets billed twice for something and then the guy who said this was a good idea gets shot.

replies(1): >>aidos+xa

>>hughrr+(OP)
Skype at its peak used a postgres queue extension, pgqueue, at its core. Seemed to go OK for them. The extension did work around some limitations when running a pure sql queue in postgres. These days I would run Pulsar but if you only have a million users and already have postgres you will be fine.

replies(1): >>conrad+vY

>>glogla+s6
The point of queues is sequencing, storage and decoupling of events for the sake of scalability and durability.

By putting it in the same transactional store with the same transactional boundaries you’re instantly shooting the whole point.

Not only that, most queues define boundaries between different logical systems where transactional boundaries do exist. At which point your database’s transaction scope is extremely limited in capability.

In the real world of messaging transactions span more than just the database engine.

It’s just bad architecture. Full stop.

replies(2): >>victor+6c >>dragon+yq

>>hughrr+J8
Meh. All your objections are hand wavey and not factual.

Databases actually work fine as a queue but emotionally you don’t like it. That’s fine it’s just not real strong objections.

What you have not said is “it physically does not work”, and that’s because it does work fine.

replies(3): >>hughrr+2a >>philda+ta >>rmetzl+1f

>>flefto+w9
I’m not saying it doesn’t work. I’m saying you’re shooting your toes off down the line.

>>flefto+w9
I don't think he's saying 'it cannot be physically made to work (given enough effort and discipline)'. The impression I got was more like 'if you go down this route then the incremental path of least resistance leads somewhere bad'.

>>hughrr+Q8
Well, I only ever enqueue idempotent work, so again, in my scenario it can fail and be rerun, but I’m also unsure about what you think is a bad idea here.

Honestly, I had more code managing two phase commits so I would only push to the queue when I was sure (ish) the dB commit was ok.

Say you need to trigger something to send an email, but let’s say that there’s an error condition after and the transaction is rolled back. How do you handle that failure? In my scenario the queue is rolled back too, no harm to foul.

>>hughrr+(OP)
In your experience, roughly at which point did it stop scaling and why?

replies(2): >>rmetzl+Je >>hughrr+vp

>>skytre+f7
> Whatever happened to use the proper data structures for the job?

This so much. People too often treat databases as magical black-boxes that should handle anything. Database is most often the bottleneck and choosing the proper storage engine with appropriate data structures can be 100x more efficient that just using the defaults. 1 server vs 100 can definitely make a noticeable difference in costs and system complexity.

While premature optimization is bad, choosing the right tool for the job is still somewhat important and will usually pay off in the long run.

replies(1): >>mirekr+Bg

>>hughrr+h3
I’ve seen all manner of hokey queue implementations in sql going back about 20 years and all of them could handle 25k enqueue bursts. That wasn’t a problem for a Sybase database on a commodity host circa 2000.

I think if I were going to argue against using DBs as queues it would be around: heavy parallel write use cases, latency concerns of both reads/writes and scaling to millions of events per second.

If you don’t have those concerns using a properly normalized and protected schema (which you are doing anyway right? Cause if not you are already shooting your toes off) for queues goes a very long way and removes a very big operational burden and tons of failure modes.

replies(2): >>justso+0i >>hughrr+Sq

>>hughrr+59
Having a separate queue makes sense across application boundaries. If you need queuing mechanisms in your own application you should be able to get by with a dB for 99% of use cases

>>hughrr+(OP)
For the scale of most tasks, the existing RDBMS that a small app is using for its data is perfectly sufficient for the queuing needs of said app. This reduces complexity in the app and removes the need to maintain additional tools.

Frameworks like Laravel make this easy because they allow you choose different backends for your DB, queue, and cache, among other things, all or some of which can be an RDBMS like Postgres or MySQL/MariaDB.

When your app's need outscale what the RDBMS can rationally handle, then you can dedicate the resources to switch to a more specific tool as your growth has indicated you should.

I can't say what the percentage of apps that can handle things this way, but the anecdotes mentioned in this thread are clearly the cases where growth outstripped what a traditional RDBMS like Postgres are well-suited for, and so migrating was the right thing. The probably was likely identifying the need or delaying the transition until it was too painful. I don't think there's a magic bullet here, but I also don't think that starting with a dedicated queuing store tool is also always the right thing to do.

>>aidos+b6
The outbox pattern. Can in many cases be solved by using topics.

replies(1): >>aidos+Rf

>>cerved+fb
I also have a lot of issues with people using tables and cronjobs instead of just using queues.

I once was the one implementing this stupid idea. It's very easy to shoot yourself in the foot. It seems to be ok at first, but it is a stupid idea and will give nightmares to the operations team.

It works as long as the table is small. It gets especially problematic when there are sudden spikes in the workload, so that the table is growing rapidly. People don't test these scenarios enough.

Often it's important to do the jobs in the order they came in. This means you need to have the index set up this way or you'll end up with full table scans. You also need to remove jobs which are done, but this will also mess with the tables performance.

There are tricks with databases to help scaling this, but at the end of the day it's much simpler to just use a queue.

>>hughrr+(OP)
You’re dealing with survivorship bias - those companies whose products were successful enough that they needed your services. You don’t speak to the teams who went bankrupt because they spent too much time fiddling with RabbitMQ, when Postgres would have been fine.

As always, there are trade offs, no silver bullets.

replies(12): >>2muchc+lg >>alunch+3i >>skytre+0k >>paxys+1n >>robwwi+4s >>dorfsm+sx >>cortes+wI >>ikiris+k01 >>rantwa+r01 >>piokoc+p11 >>pyuser+881 >>rkk3+DP1

>>flefto+w9
The range where databases work fine is pretty small. Sudden spikes in job creation will kill the performance of the database. Often enough the job creations come from external systems you can't control.

>>AtNigh+ne
Ah ha. Yup, that’s exactly it. Thanks for the pointer.

Edit: can you explain by what you mean by solving it with topics? I thought topics were a broadcast mechanism?

replies(1): >>AtNigh+UL6

>>blowsk+0f
In my experience proper abstraction is the key. If you have a clear abstraction and structure you can replace the implementation when you need to without too much disruption. If it’s too leaky you’re screwed.

replies(3): >>goodol+Ng >>hughrr+5o >>aparso+ao

>>zigzag+rb
I think your "most often" is more like 0.01%, I'd say the inverse is true, that _most_ would be fine with single sqlite host or something like rqlite.

replies(2): >>otoole+Wj >>zigzag+kl

>>2muchc+lg
100%! Create a good enough solution and put it behind an abstraction that remains stable when you later create an ideal solution

>>hughrr+J8
> you have to build your own monitoring stack,

How is this done in Redis automatically? At some point you are writing queries (albeit Redis ones) to pull metrics. Other queueing systems may expose an API but there is always some level of integration. With Postgres/other db I would write SQL which is their API and a powerful one at that.

I can couple events with triggers to auto-generate other events etc, have built in audit capability, roll up reporting etc, all with standard SQL. (all in the context of monitoring and reporting)

replies(1): >>hughrr+ip

>>kasey_+Sb
I agree.

Going from a single database to a (database + queue) means two server processes to manage, maintain, observe, test etc.

I actually start with SQLite to reduce as much distributed state as possible, then move to something else once it’s proven it will not work.

>>blowsk+0f
You're not wrong, but if anyone on the team has already worked with a queue system that they need to manage then utilize something that scales. Otherwise there's no reason to use the cloud solution you're hosting with which hand hold you to make it work quickly with your integration.

>>mirekr+Bg
rqlite author here. Happy to answer any questions about it.

https://github.com/rqlite/rqlite

replies(1): >>mirekr+Ko

>>blowsk+0f
> You’re dealing with survivorship bias - those companies whose products were successful enough that they needed your services

In fairness, Redis has only been available for 12 years so someone who has been in the industry longer has probably encountered systems using DBs as queues for no better reason than an alternative was not available when the system was made. (Rabbit just a couple years older I think.)

But in this day and age, you have better options to start with c'mon.

> they spent too much time fiddling with RabbitMQ, when Postgres would have been fine.

Rather ironic claim in a thread full of tips, tricks, and gotchas just to make PG behave like a queue no?

I can't speak for RabbitMQ personally but Redis has got to be among the simplest drop-in dependencies I've ever encountered. For a simple queue, the defaults do out-of-the-box.

Gut feel also says, Redis defaults for queuing would also serve more than PG defaults for queuing so you could leave it untouched for far longer to focus on your product, get to profitability, etc. Feel free to debunk with data.

replies(5): >>hughrr+qo >>stickf+yy >>boardw+Ez >>sh87+Yz >>derefr+TE

>>mirekr+Bg
What would you then consider to be the most common bottleneck?

I agree that there are many cases with low workload where that would be plenty.

replies(2): >>mirekr+Sn >>solips+jI

>>blowsk+0f
You are wayyy overestimating how complex something like RabbitMQ or Redis is. You don’t need to hire entire teams of engineers or go bankrupt setting up an instance. It is less work than implementing a production-level queue in Postgres for sure.

replies(4): >>zomgwa+Uo >>aether+6p >>fauige+op >>mbrees+Ap

>>zigzag+kl
Most common bottleneck is lack of competence.

Direct visible effects are wrong decisions entangled in spaghetti-like complexity.

It's hard to reach technical bottleneck in well designed systems. Computers are really fast novadays. They will vary greatly depending on what kind of system it is. Out of resources – cpu, memory, network, disk io – likely the weakest of them will be saturated first – network. But that's not a rule, it's easy to have system which will saturate ie. CPU before.

replies(2): >>hughrr+Lo >>zigzag+6u

>>2muchc+lg
Yes. Use a queue abstraction for which there are numerous queues available off the shelf for!

>>2muchc+lg
In my experience, great abstractions without a good data migration blueprint is as about as useful as no abstraction at all.

In this example - great, you have a “queue” abstraction. When you switch to RabbitMQ or Postgres, how do you move your existing data without a quality of service or data loss? It can be difficult with production datastores even if the abstraction within your service is beautiful.

replies(1): >>lolind+yt

>>skytre+0k
I've got about 6 years operational experience with Redis. I would not use it for a queue. I struggle to consider using it as a data store these days as well with other options on the table. But that's a whole new thread so I'm not going into details here.

If you want a queue, get a queue.

If you want a relational database, get a relational database.

If you want a key-value store, get a key-value store.

Don't cross the streams.

replies(2): >>mikekl+S61 >>Too+YY1

>>otoole+Wj
Are you planning on adding websockets or something similar in the near future to support things like ie. data change notifications [0]?

[0] https://www.sqlite.org/c3ref/update_hook.html

>>mirekr+Sn
This is the best post in this thread.

A lot of people don't see the effects of their decisions. They leave a company after 3-5 years and go and work somewhere else where they get to make the same mistake again. The bottleneck indeed is lack of competence.

As for technical bottlenecks, it's quite easy to hit a wall. Be it through layers of stupid or unexpected success. We have unexpectedly reached the limit of what is possible with x86-64 on a couple of occasions due to stupid decisions made over 10 years previously for which there is now no longer the budget or attention to fix.

>>hughrr+(OP)
the other side of the coin is that queue servers are databases, just optimized for the use case.

>>paxys+1n
I agree. I’ll also add that Redis is a level easier to operate than RabbitMQ.

>>hughrr+(OP)
You're not wrong, but every solution involves compromises. More broadly, I would argue, though, that almost nobody actually needs queues.

Sure, there are cases were you actually need strict ordering, but they are relatively few, in my opinion, involving things like transaction processing or other areas where the order of operations change the effect of the operations and must be strictly sequential.

Ordering itself brings significant technical challenges. For example: With a queue, you can only have strict ordering as long as you only process one item as a time, with no concurrency. Ordering also complicates handling of failures and retrying, because it means every newer item has to wait until the failed item has been dealt with; if you use things like dead letter queues, order is violated.

In almost all use cases -- anything from image processing to newsletter delivery -- a better system is to have an unordered list of items that anyone can take items out of, in any order. Then you throw as many workers at this system and design it so that there's minimal locking needed to prevent workers from stepping on each other's.

There are queue systems that have a purpose in a stack, specifically pub/sub brokers, which can be used to schedule work along several dimensions. You use the pub/sub system to signal work to be done, but each payload just refers to some state that's stored elsewhere.

replies(2): >>hughrr+xq >>gravyp+Dq

>>paxys+1n
Have worked at multiple companies that successfully used redis or rabbit with teams of less than five engineers.

It's a little insane that a highly rated thread on HN is telling people to use postgres as their queuing solution. The world is wide, I'm sure that somewhere out there there is a situation where using postgres of a queue makes sense, but in 99% of all cases, this is a terrible idea.

Also, SQS and nsq are simple.

replies(2): >>politi+3M >>Gordon+MO

>>deskam+Kh
Um, I'm not suggesting using Redis. In actual fact I said elsewhere I wouldn't use Redis for queues.

As for triggers, reporting, audit, I'm laughing now because you miss the point. That's another bunch of IO in your monolithic black box, and you're still building it.

Start here: https://www.rabbitmq.com/prometheus.html

replies(1): >>Jarwai+D71

>>paxys+1n
The issue is not necessarily the complexity of RabbitMQ or Redis. The complexity comes from having to manage another (stateful) process that has to be available at all times.

replies(1): >>hughrr+Eq

>>cerved+fb
It stopped scaling when we put in the PO for a $150k x86-64 box to run it on and finance said WTF.

replies(1): >>cerved+fL

>>paxys+1n
“Production-level” means different things to different groups. Many (most?) groups don’t operate at the scale where they need a specialized queue in a dedicated message broker. Simple queues in a DB will work fine. Even if it isn’t very complex, why not already use the infrastructure you probably already have setup — an RDBMS?

Now, if you’re using a nosql approach for data storage, then you already know your answer.

replies(1): >>gravyp+8r

>>hughrr+(OP)
Exactly, GP does not sound like someone who has worked on anything with decent scale. Because if you did, you would know that it is a really, really bad idea.

replies(2): >>lolind+xu >>maniga+jA

>>atombe+Zo
I think you're missing something important here by applying the worker pool simplification to the idea. Order is important i that case as well. It forces both fair scheduling of the work and the metrics from the queue can easily describe the overall latency of the work being done and the capacity and utilisation. The latter can be used to scale the consumers of the queue and/or alert on when things aren't working properly.

If you throw a proper messaging broker on the table like RabbitMQ then you're getting message acknowledgement as well which is important if you must do the work. A failed consumer will result in the work being rescheduled on another consumer transparently. Of course there are metrics you can see and get heads up when your consumers are not working.

Ergo, don't simplify something too soon. Even reliable delivery of newsletters is rather important otherwise why would all the tracking pixels exist? (they are a form of message acknowledgement)

replies(1): >>atombe+M61

>>hughrr+59
> The point of queues is sequencing, storage and decoupling of events for the sake of scalability and durability.

The point of queues in any particular application is some subset of that; the calculus of implementation approaches that make sense depends not on the abstract point of queues in general but on the concrete point of queues in your use case.

>>atombe+Zo
I use Queues for retries, buffering, and parallelization. It's pretty easy to write most programs to support at least once and to run multiple queue workers that you auto scale.

>>fauige+op
Yes. Pay Amazon to do it and add use all that saved time to add business value instead.

They'll also manage the consumers of the queue and scale them too! Serverless is bliss.

>>kasey_+Sb
It wasn't a problem for Sybase on a commodity host circa 2000 because clearly that host wasn't doing a whole lot of other stuff. It's a big problem for our 48 core nodes with 2TiB of RAM and a metric shit ton of DAS NVMe. Ergo anecdotes don't scale either.

To clarify we just moved this entire problem to SQS.

>>mbrees+Ap
My main concern would be monitoring. Most queue systems connect to alerting systems and can page you if you suddenly stop processing thongs or retrying the same query many many times. For a DB, since the scope of access is much larger, you don't get these sort of guarantees for access patterns and you essentially need to reinvent the wheel for monitoring.

All to save 2 to 3 hr of Googling for the best queue for your use case and finding s library for your language.

It makes sense if you don't care about reliability and just need something easy for many many people to deploy (ex: Drupal).

replies(1): >>bcrosb+qx

>>blowsk+0f
Great comment: Been reading Taleb’s Black Swan this morning on ascertainment bias, so this resonated well.

>>skytre+f7
What has me wanting to stick with postgres is that I work on a small team (two developers) and adding more technologies to our stack is extra overhead that's hard to justify. At our peak we're currently handling one request per second, and postgres for a queue is more than sufficient for that. Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows? Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?

replies(1): >>skytre+uz

>>aparso+ao
Isn't migrating a queue super simple? Have a window in which you only add to the new system and you listen to both. When the old queue is empty, delete it.

If you need to keep track of old data, then yes, migration is hard. But queues don't save old data—that's the point of a queue.

>>mirekr+Sn
Competence is expensive :) While I mostly agree, even well designed systems have (sometimes considerable) tradeoffs.

> It's hard to reach technical bottleneck in well designed systems. Computers are really fast novadays.

I have been listening to how fast moderen computers are for the better part of the past two decades, yet as I user I still have to deal daily with too many of slow software and slow web services.

replies(1): >>mirekr+iG

>>vincen+cq
Or maybe GP is someone who has worked on projects from their infancy. If your queue will be handling less than a request per second for the foreseeable future, why not just use the technologies already in your stack and plan a migration path now? What do I gain from adding Redis (and thereby making my system even more distributed, with all the hazards that carries) if my project isn't even guaranteed to make it to 10 rps before going bankrupt?

>>gravyp+8r
We use pingdom that hits a page that gives it the health of various systems. Queue included.

> All to save 2 to 3 hr of Googling for the best queue for your use case and finding s library for your language.

The cost of using a new piece of tech in production is not just 2 or 3 hours.

replies(1): >>gravyp+wz

>>blowsk+0f
There is a lot of space between Postres and RabbitMQ. If performance is a factor it takes alot of work to make a Postgres table fast enough. These days, and for the past at least 5 years, you're better off using a service like SQS, or setup Redis yourself.

>>skytre+0k
There's one incredibly valuable feature you get with PG that you will never get with Redis or RabbitMQ or any other task queue: Enqueueing tasks transactionally with your unit of work.

If your load requirements fit using your database as a queue, it can radically simplify your system.

I've personally done a fair bit of migration from formal queue systems into an RDBMS. You simply get a lot more control and visibility that way. But I use task queues heavily too. It's all very application-specific and I don't think you can generalize.

replies(2): >>politi+5L >>Gordon+XN

>>lolind+0t
> Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows?

Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

Plus, as me and others keep saying, Redis (and other proper queues) isn't a complex addition to your infra. This isn't Hadoop, or Kafka, which is a very special type of queue (one way to put it, at least).

> one request per second, and postgres for a queue is more than sufficient for that

Yes I agree but...

> Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?

What I read when I see such statements is this mythical software engineering ideal that with enough abstraction, a migration is just a matter of writing a new class that implements some interface and then changing a config. For a sufficiently complex app infra, that happens almost never because you could never keep the abstraction leaks to an acceptable level.

Another thing, abstraction does not solve all your problems if the underlying implementation is poor fit to begin with. Let me paint you a plausible scenario:

Once you are large enough, you might find your PGQ acting weird and you realize it's because someone in the team wrote code that accesses your queue table like it's an actual table of records, not a queue. So you think, okay let's prevent that from happening. Maybe you add users and permissions to distinguish connections that need to access between tables proper and queue. Maybe you start writing stored procs to check and enforce queue invariants periodically.

Well, guess what, all those problems would've been solved for free if you invested maybe one work day getting a Redis server running when you were a two-person op serving one request per second.

Lastly, scaling a relational DB is an entirely different beast from scaling a queue. Scaling anything does not ever come painless but you can reduce the suffering when it comes. Would you rather scale PG so it can keep acting as a queue or scale a queue that's, you know, really a queue in the first place? Heck the latter might even be solvable by throwing money at the problem (i.e., give it more compute).

[1] Except for the part where you need to make money, of course.

replies(1): >>yongji+r31

>>bcrosb+qx
If you're at the scale where your postgres db can be used as a queue and no one on your team has experience running a these systems (most startups) then pretty much anything will work to begin with and as long as you have a clear interface that separates your code from your deps for the queue it'll be easy to swap out.

At $JOB-1 I wrote a `Queue` and `QueueWorker` abstraction that used an environment variable to switch between different queue backends while providing the same interface. Because of this I got everyone up and running with Redis lists as a queue backend and then could add in things like MQTT or RabbitMQ as things evolved. I also defined very loose constraints for the queue interface that made it so the implementer. Essentially there was a `push()` which added something into the queue or failed and returned an error. Then there was an `onWork()` which was called whenever there was work "at least once" meaning your system had to handle multiple instances being delivered the same work item. We would only ack the queue message after `onWork()` completed successfully.

There's not really anything preventing a team from doing this, putting a pin into it, and coming back to it when there's a scalability or reliability concern.

>>skytre+0k
You (or whoever is doing ops) will have to pay with your time to operate Redis until the product dies though.

I don't think adding another component (especially one with storage/backup requirements/more complex failover procedures) should be taken lightly.

Paying attention to the tips/tricks/gotchas is something you'd pay once and hopefully document in source code/documentation.

If you piggyback off someone else's work (another team at your job, a cloud-based solution that handles everything, etc) the calculus would change, of course.

replies(1): >>skytre+UR

>>skytre+0k
Using a tool without knowing reasonable bounds of the domain, current requirements, and how the tool (redis in this case) solves the problem isn't good advice.

Case in point our team went with Redis, just the default, use it blindly without fully understanding our requirements and how redis helps scale.

2 years later we spent 2 sprints, holding back the release trying to understand RDF vs AOF, and why we're seeing massive spikes in consumption and performance triggering pod eviction, running comparison tests to prove which works better and explaining why, running qa tests (regression, performance, load), introducing postgres for queuing, redoing our code to bypass the sync mechanism between how data flows between redis and postgres, updating dependenciies, migrating existing customer data (various on site locations), explaining this to team members, managers and their managers, installation technicians, support engineers and presenting it at engineering townhalls as a case of bad decisions.

Not worth it.

replies(2): >>satees+jC >>skytre+2Q

>>vincen+cq
Very few things have "decent scale". 99% of apps can run on a single server with modern hardware.

>>sh87+Yz
I believe you meant "RDB vs AOF".

Also reading your reply I get the impression that "sync mechanism between redis and postgress" was the bottleneck. Wondering if you can add some details around it and also was this something that can't be fixed by fine tuning redis config, rather than completely removing it from your stack.

replies(1): >>sh87+TC

>>satees+jC
Yes I did mean RDB. Thanks for pointing out.

There were many problems but at the core of it, this was us having redis write huge amounts of data to disk very frequently causing this.

We could not reduce the frequency (product would not allow) and we couldn't find a way to make the writes reliably fast.

I like to believe there exists a possible way of handling this, but point being, our team had no time to find out, how redis works internally and have confidence that the new way won't bring up new surprises.

>>skytre+0k
> Rather ironic claim in a thread full of tips, tricks, and gotchas just to make PG behave like a queue no?

There are libraries in many languages written specifically to accomplish the "make PG behave like a queue" part. All these people are explaining the principle of how it would work, but in practice, you can just pull in one of those libraries and be done. (Or, more accurately, you use an abstract external-job-queuing library and tell it to use its Postgres adapter, which in turn uses that library.)

In exchange, you get the ability to not have to set up / maintain any more infra than you already have. Which can be important if you're already "at scale", where every component you set up needs to be duplicated per geographic-region, clustered for High Availability, etc. If you've already done all that for Postgres, it'd be nice to not also have to do all that again, differently for Redis, RabbitMQ/Kafka/NATS, etc.

> Gut feel also says, Redis defaults for queuing would also serve more than PG defaults for queuing

Don't know about what you've got your queues doing, but our service only does (persistent, out-of-process) queuing for two purposes:

1. Global scheduled background-job queuing. (Think "cron jobs.") There are only a static number of these; they don't scale up with load.

2. Lifecycle email dispatch. This scales up with MAU, but with very very low coefficients.

For those cases, Postgres would totally be enough. (Anything would be enough!)

Most "queues" needed in the real world are requirements-light like this. It's a rare company whose product is doing something Apache BEAM-alike with its data, where a high-throughput reliable MQ is a fundamental part of the data workflow. (And those companies, I hope, know that they need an actual MQ, not a piece of software that does queuing in its spare time.)

Our company doesn't use Postgres queuing; we do, in fact, use Redis for it instead. But we only ended up doing that, because we already needed Redis for other stuff; and if you already have Redis in play (including an ops plan for scaling it), then it's the better tool for the job.

replies(1): >>skytre+fP

>>hughrr+h3
What database can't handle an extra 25,000 entries?

That's... nothing. Databases handle billions of rows effortlessly using a b-tree index. So not really sure what point you're trying to make?

>>zigzag+6u
Somebody once said "cheap things are expensive". This idea applies to developers as well. Cheap developers will drive company through bumpy roads towards uninteresting plains. Good developers not only pay for themselves but bring orders of magnitude more cash in. Only thing that touches on this that I can find is "software craftsmanship manifesto".

replies(1): >>zigzag+eT

>>hughrr+(OP)
Can you explain exactly what doesn't scale well?

Databases scale well in general, it's one of the things they're built for.

A single database server with an indexed table (b-tree) and an SSD is extremely performant. (And yes there are other indexes that could be even more performant, but b-trees are already extremely fast.)

But the huge advantage of putting a queue in your database is being able to create transactions across both the queue and data in other tables that needs to change at the same time.

Not to mention one less extra tool to maintain, existing database tools for backups replication etc. automatically include the queue as well, etc.

As well as the fact that it's incredibly easy to make your queue behave according to whatever kind of custom business logic you might need. (Which is where additional indexes and b-trees can turn out to be necessary, and which is precisely what databases are great at.)

So to the contrary -- unless you're at Facebook-level scale, using modern databases for queues generally seems like an excellent choice. (And in the special cases where it's not, that's usually quite obvious.)

replies(2): >>Rapzid+lv1 >>sh87+sK1

>>zigzag+kl
Money. The most common bottleneck is money and customers. Use whatever helps you get new customers faster.

Don't be scared of having to make changes in the future. Do the small amount of work it takes today to make sure your transition in the future is easy.

Transitioning from a SQL queue to redis it's only difficult if you have a bunch of SQL throughout your code. If you have that, you did it wrong.

>>blowsk+0f
Sure, but I also feel like this comment is based on an assumption that everyone works at startups. A lot of us work for larger, already established companies that are past that will we or won’t we survival stage. We are trying to design systems to solve needs for already validated products.

It might seem like everyone works for a startup, but big companies have a lot of engineers, and we still have to solve new problems there even though the company is not new.

>>skytre+f7
Your comment sort of explains why you would use your DB as a queue. It is a big task to migrate to a new system. If you already have Ppostgres or MySQL integrated and deployed. Using it as a queue may be the simplest option.

>>hughrr+(OP)
There are A LOT of use cases for a queue that aren't particularly computer intensive and you aren't worried about ever needing to scale - especially larger than your Postgres database.

So while this might be good advice if you will need to scale, it's certainly not blanket advice.

In the first case, I would much rather just have Postgres and not a separate queue to manage.

>>stickf+yy
Not sure why you were downvoted, but perhaps they were thinking about Redis transactions (lua scripts that can do multiple operations at once). If Redis is your datastore (groan), then you can transactionally enqueue tasks with your unit of work.

replies(2): >>arthur+H01 >>stickf+nK1

>>hughrr+vp
What about in terms of number of requests and requests per second?

>>aether+6p
If your data storage pattern involves a database write followed by a message queue insert, then the ability to wrap those in a transaction can be a good trade-off to avoid consistency failures between the writes.

Avoid consuming that queue directly though -- which is probably what you're thinking when saying this is a dumb idea and I tend to agree. Typically, you want to have a worker that loads entries into a more optimal queue for your application.

Bottom line though, there is not a single best "queue" product. There are lots of queues offering wildly different semantics that directly impact use cases.

>>stickf+yy
You can actually do this with RabbitMQ - it has a feature called "Publisher Confirmations", where you send a message, and then wait for an "ack" in return.

replies(1): >>blowsk+vR

>>hughrr+(OP)
Lean startup begs to differ :p

>>aether+6p
For years I've been using RabbitMQ in small teams, and me as a one-man team too.

As long as you don't need clustering (even a single node can handle some pretty heavy load), it's actually really simple to setup and use - way easier than Postgres itself, for example.

My biggest beefs with it have historically been the Erlang-style config files (which are basically unreadable), and the ridiculous spiel of error messages you get if you have an invalid configuration. But thankfully RabbitMQ switched to a much simpler config file format one or two years back, and I understand the Erlang OTP is working on better error messages and stack traces.

>>derefr+TE
> In exchange, you get the ability to not have to set up / maintain any more infra than you already have.

And, you mean, a PGQ will not need any special monitoring other than the common DB metrics?

For instance, if I ever run a queue, it's just due ops diligence to know the length of the queue, average time an item spends in queue, throughput over time, among others. Are there standard monitoring modules that would check this for a PGQ? Because in exchange for setting up a proper queue cluster, compatibility and documentation for common stuff like this is also what you get.

The first one is particularly a sticking point for me. If you don't do it right, you will end up issuing a COUNT on a table periodically. You might say it's acceptable for most companies, because they don't need high-performance queues, but I specifically need my monitoring to be reliable during times of unexpected high load. Also, ideally, there is close to zero chance that my telemetry is the one that ends up causing me trouble.

Also, why does no one consider the added "surface for human error" that using a DB for a queue introduces. For instance anyone can drop in the DB and perform a query you were not supposed to. A malicious actor can update all outgoing emails in the queue to another recipient. If these are sensitive emails like password reset or OTP, good luck. A dedicated queue process does not allow such operations.

I have to say, it's very perplexing to me that people (especially in HN) would avoid using the right structure just because they have to invest a bit more work.

replies(1): >>derefr+GQ

>>sh87+Yz
Well, by your admission, you used Redis for a problem domain it wasn't suited for in the first place. How is this an argument for using in-database queues?

> use it blindly without fully understanding our requirements and how redis helps scale

I'm sorry I don't get how I could come across as advocating the use of Redis blindly. My point is if your data flow looks like a queue, then use a queue, don't hack a relational DB to become a queue. I think that's reasonable rule of the thumb, not going in blind.

replies(1): >>sh87+sy1

>>skytre+fP
> I have to say, it's very perplexing to me that people (especially in HN) would avoid using the right structure just because they have to invest a bit more work.

You're imagining this as if it's just the CapEx (the work of setting up Redis), but in reality it's the OpEx (the work of maintaining a Redis cluster) that kills you. Think of it in terms of the number of ops salaries you have to pay. At scale, this number goes up in proportion to the complexity of your stack.

replies(1): >>skytre+9T

>>Gordon+XN
If you’re using the same database for your queue as your business entities, then you can wrap both in the same transaction - I think that’s what the parent post is getting at. For example, save a customer entity and add a job to send them a confirmation email. As far as I know, that’s something you can’t do natively if you’re using RabbitMQ and PostgreSQL. Of course you can work around it, but when your workloads are low you don’t need to, which is great for early stage products.

replies(1): >>spulla+TV

>>boardw+Ez
> You (or whoever is doing ops) will have to pay with your time to operate Redis until the product dies though.

Repeating my other comment: do you mean to say using an in-DB queue will not need special monitoring other than what's already standard metrics for DBs?

Since a queue is a different structure, I would want to know metrics that might be unfamiliar for standard DB monitoring. Queue length, queue throughput, etc. Can I get that from my DB queue with negligible cost?

Another thing, there might be standard practices and patterns for in-DB queues, often encapsulated in a nice library/middleware. But as databases are not queues there is no assurance from the vendor that those patterns will remain applicable in future versions.

> Paying attention to the tips/tricks/gotchas is something you'd pay once

Actually, I disagree with this. Everytime someone connects to that database, they have to remember that one special table which is used as a queue, and remember to tread differently where it is concerned.

Setting up a queue? You do that once and everyone who interacts with it will not forget it is a queue. Heck you can't even do your usual DB operations there. How can you not remember it is a queue?

>>derefr+GQ
Actually, if you read the rest of my previous reply (as opposed to an offhand comment at the end) you will see that I'm considering the logistics of operating an in-DB queue as well.

Using an in-DB queue doesn't give you zero OpEx does it? Maybe you can make the argument that it's cheaper but it's cheaper for a reason: the job wasn't done properly, so to speak.

Both options introduce new failure modes and op costs into your system. Might as well do it properly if (and this is a huge IF in my opinion) for slightly more cost. When you run into a failure it's standard, maybe the solution is even one Google search away as opposed to realizing, one hour into a late-night debugging session that, shit, that table should be treated as a queue!

replies(1): >>derefr+tZ

>>mirekr+iG
True

>>blowsk+vR
In that case it doesn't help you that much since you can't transactionally send an email but you would get at least one attempt.

replies(1): >>blowsk+s01

>>mianos+S8
I worked at a place using pg_queue (the person who created it worked there before going to Skype IIRC) and maybe it was working well operationally but it was not a great experience to develop on it. It was hard to generate events and debug/observe them (it was height years ago so the details are a bit fuzzy).

MS also moved from it afterwards when they acquired Skype but I guess it was more because of the shift to SQL Server.

replies(1): >>mianos+Tz1

>>skytre+9T
I read your whole reply. I didn't see any mention of anything I'd consider an OpEx cost.

Re: Monitoring

Presuming you're already monitoring anything at all, adding monitoring for a new piece of infrastructure is a pure CapEx cost. You presumably already have monitoring infrastructure running. Its ingestion, by design, will be 99% idle — i.e. it won't need to be scaled horizontally proportionally to the number of components. The only thing needed, then, will be careful up-front design for the monitoring.

Which is usually also a cost already paid for you in advance when you use an abstract job-queue library. They all know how to expose Prometheus metrics endpoints, and they do the accounting to serve those endpoints efficiently (usually using process-local per-job-producer and per-job-consumer counters, which you must then roll up yourself at the PromQL level, taking the irate() to find spikes.)

Re: AAA

> Also, why does no one consider the added "surface for human error" that using a DB for a queue introduces.

Because the type of person who reaches for their DB first to solve a novel problem, is the same type of person who understands and makes use of their DB's security features, to the point that doing more DB-security-config isn't a complex one-off problem for them, but something they can whip off in a few minutes.

For a simple option, you can create two service-account DB roles: a queue_producer, and a queue_consumer. You can put all the job-queue implementation tables in a schema owned by the queue_producer; and then grant the queue_consumer SELECT privileges on all the tables, and UPDATE privileges on some of them. Then, nobody but the job_producer (or a DB superuser) can create or destroy jobs; and nobody but a job_producer, a job_consumer, or a superuser, can read or modify jobs. (Your job-queue abstraction library within your service usually maintains its own DB connection pool anyway, so it's no sweat to have those connections use their own credentials specific to their job-queue role.)

For a more complex — but perhaps useful? — option, the tables themselves can be "hidden" behind stored procedures (DDLed into existence by the abstract job-queue library), where nobody has any privileges (incl. SELECT) on the tables, only EXECUTE rights on the sprocs. And the sprocs are carefully designed to never do anything that could have an unbounded CPU time. Then anyone can "check up on" or even "insert into" the job-queue, but nobody can do anything "funny" to it. (My god, it's an encapsulated API!)

Once again — the libraries that abstract this away, already think about concerns like this, and choose one or the other of these options. That's why libraries like this exist, when the "core premise" is so simple: it's so there's a place to put all the fine details derived from man-years of thought on how to make this approach robust.

-----

On a completely different track, though: having a queue in the DB can sometimes be the optimal (i.e. the "if we had infinite man-hours for design + implementation") engineering decision. This case comes when the thing the queue is operating upon is data in the DB. In such a case, the DB data modification, and the job's completion, can succeed or fail together atomically, as part of a single DB transaction.

To accomplish the same thing when your queue lives outside the DB, you usually end up either needing some really gnarly distributed-locking logic that both your DB and your app layer need to know everything about (leaky abstraction!); or you need to completely remodel your data and your job queue into an event-streaming paradigm, so that you can "rewind" one side when the other side fails.

replies(1): >>skytre+s41

>>blowsk+0f
You say that as if rabbitmq takes more than an 8th grader to set up in a few hours?

replies(1): >>Gordon+T51

>>blowsk+0f
no. he/she actually has a point. a database is not a queue

https://mikehadlow.blogspot.com/2012/04/database-as-queue-an...

replies(1): >>geggo9+sh6

>>spulla+TV
It doesn’t help guarantee the email is sent, but it makes it easier to find and handle unhappy edge cases, which is where a lot of time is spent in an early-stage startup.

>>politi+5L
Real time trading systems can totally get away with only using Redis, outside of historical data (which obviously you'd stick into Postgres or your favorite flavor of DB)

>>blowsk+0f
Not sure. All messaging solution brokers walked away from relational database based storage long time ago (I've seen how badly JBoss 4 messaging with persistence sucked) and for a good reason as either database was occasionally killed by messaging traffic, or messaging traffic was crawling because database was too slow.

In these days with Docker, ready cloud solutions getting the message broker is so simple that there is no reason to use database as a broker. If we know that our traffic will be so small, just install DB and message broker on the same VM, to avoid spending more on hardware or hosting.

>>hughrr+(OP)
It can absolutely scale well if you design the database properly. However, it’ll probably be fairly high latency compared to a Redis or Kafka stream.

The things that often make databases appear slow is that they’re typically configured for large random reads rather than small uniform writes. And they have insane initial response times because of all the IPC they typically do when starting a session.

But aside from that, if you use a heap table without an index you’ll can probably get similar throughout to Kafka. And if you turn Fsync off you might even get up to REDIS speeds.

replies(1): >>onepla+i31

>>sontho+Z21
It's possible at scale, but the engineers that can do that are usually 1:20 outnumbered within most organisations. (when scoping out FAANG and co) Asking a developer to design a database is not even normal process today, unless the ORM does it for you, very little actually is done with the database. It might as well be a flat file...

replies(1): >>api+w31

>>skytre+uz
> Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

I have to disagree. Of course code quality is important, but building things "properly" because "we may need it later" is a great way to kill a project with complexity. KISS, YAGNI. An early startup is, IMHO, not a good place to learn about new frameworks while getting paid - you're on borrowed time.

Make a back-of-the-envelope calculation about how much throughput you need. E.g., if you expect to have 10,000 users, and each may make one request per hour, you're dealing with 3 qps. Anybody who wants to bring in a new dependency for this, needs some talking to.

(If you already need Redis anyway and it's a better fit than Postgresql, then sure, go ahead.)

replies(1): >>skytre+k81

>>onepla+i31
That’s sad. I learned relational DB design before I even really got into systems programming and it was really a valuable thing to learn. You can be so much more efficient and less error prone with normalized data structures.

There’s a reason SQL is still around after all these years. It’s basically math and while the syntax is a little moldy the principles are eternal.

>>derefr+tZ
> For a simple suggestion, the job queue can live in a schema that is owned by a job-queue service-account role. Nobody other than that user (or a DB superuser) can issue any query in there.

Yes service accounts are fine and dandy but [a] isn't that more hoops borne by choosing the wrong tool for the job (gotta stay mindful of those permissions, you can have an Ansible typo, etc) and [b] as I said, proper queue services would not even allow tampering of enqueued data. For no extra cost other than installing the service. In contrast, your service account falling into a malicious actor is still a threat vector.

> For a more complex — but perhaps useful? — option ...

Now this is just a queue service with extra steps running in a relational DB instead of natively as an OS process. You did cite it as just an option but I don't see why this is an attractive option.

> I didn't see any mention of anything I'd consider an OpEx cost.

It seems to me we have differing definitions of OpEx and, judging by your previous comment, you value this almost exclusively in terms of "the number of ops salaries you have to pay". Even if I play by that definition, I could tell from experience (and also corroborated by other commenters here, in other subthreads of this discussion), that operating a Redis cluster does not need extra warm bodies. The people who monitor that your app responds, your servers aren't running out of memory, could take on the task with little to no additional training.

The experience I want to cite, bearing in mind of course that everyone's mileage varies: in my previous job, customer base of 2M and growing across two countries, we operated and monitored a redundant Redis cluster with an ops team of 3. In my first job I talked about in another subthread here, we managed three high-traffic Redis queues (and a few other smaller-traffic special cases) with a team of 7. In both jobs, we also did active across-the-stack development; we weren't dedicated to monitoring, nor to the components related to the queues.

In fact I would argue running a DB (queue or no queue) is just more complex than running a queue service. Rather telling is the need for "DB Administrators/Specialists" but no such equivalent for Redis or other proper queueing solutions.

> Which is usually also a cost already paid for you in advance when you use an abstract job-queue library. They all know how to expose Prometheus metrics endpoints...

Honestly, this sounds fine but I'm rather wary of the caveats that might be present. How do they do it efficiently? What do I need to keep in mind to keep things efficient? As such, unless you can point me to your preferred job-queue library which does all these wonders, I have to reserve judgment.

Edit: I just remembered mentioning this in another subthread. Even if you have a fantastic library/middleware abstracting all those queue concerns, that ties you in with that library. If someone wants to build integrations into your queue, they have to play by that library. If majority of your app is in Java (and so is the library), and Data Science (who, in this example, uses Python, not improbable to happen) wants to share your data pipeline, if that library isn't available for them, tough luck I guess? More dev time for someone.

And also, whatever underlying DB features or semantics your library might rely on in order to enforce a queue structure, you can't be assured that future versions of your DB would support those features because, you know, your DB isn't really in the business of queues. It opens you up to running an outdated DB version just to keep that queue functionality.

replies(1): >>derefr+1b1

>>ikiris+k01
To be fair, you can pull and run the Docker image in a few minutes, and the defaults will work out-of-the-box for a lot of folks.

If you don't need clustering, RabbitMQ really is pretty simple to setup.

>>hughrr+xq
If latency is important, I would argue that ordering doesn't help you.

That is, if you have the computational resources to service your entire queue within the required latency, then ordering doesn't matter, because workers could simply claim the first available random task, and it doesn't matter who finishes first or last. If you don't have the resources, then ordering still doesn't matter, because you are blowing your latency budget anyway.

For example, say users are uploading photos which go into a queue for scaling, cropping, etc. Let's say 10 users per second are uploading a photo. In the best-case scenario, the queue is empty, so all users should see their photo fully uploaded quickly. Worst-case scenario, your queue has a million tasks pending, which benefits a handful of early users who see their photos completed, whereas most users will simply wait forever. Prioritizing early users (i.e. queue order) doesn't benefit anyone when everyone will be miserable, so it's an arbitrary choice.

The ordering imposed by queues can create an illusion of fairness when task execution time (i.e. end-to-end latency) is not taken into account. If one user uploads a 1 gigapixel photo, it's most likely going to take up more processing time than other users, and again, there's no reason to prioritize that user over any other user just because their request arrived earlier.

If you have random-order delivery, you can still measure latency, capacity, and utilization, as well as have retries and acking, and utilization-based scaling.

replies(1): >>Rapzid+2w1

>>hughrr+qo
What would you use redis for, if anything?

replies(1): >>yawara+Gc1

>>hughrr+ip
Currently at work we use a postgresdb for queues for long-running persistent jobs, but I'd like to move away from that model since I have doubts about how it'll scale.

I've thought about using rabbit mq, but have a few questions: - it appears that when a job is consumed, it's gone. How would I maintain a record of the jobs & status of the job in rabbitmq? If I wanted to display job updates how would I handle that? I didn't think you could update a message once it's already in the queue

Or am I missing the mark? Do I want to separate the business entity "job" that maintains a status and updates and such from the "thing that dispatches jobs to workers"? And when a worker Has the job it just updates the business entity in the database?

replies(2): >>doktor+4e1 >>Rapzid+ox1

>>blowsk+0f
I’ve watched companies maintain their non-scaling DBs all the way to the grave.

It’s a killer.

If it’s “just a cache” OK. Fine. If you need it for something important like queuing or certain caching scenarios - be aware it won’t scale well.

>>yongji+r31
> but building things "properly" because "we may need it later" is a great way to kill a project with complexity

Emphasis added because I feel like I addressed this in the paragraph immediately after the one you quoted:

> Plus, as me and others keep saying, Redis (and other proper queues) isn't a complex addition to your infra

I'm speaking out of experience and, as I already pointed out in another subthread, Postgres is far more complex than Redis. Consider the presence of "DB Admins/Specialists" and the lack of counterpart thereof for Redis and other queuing solutions.

Of course, if queues are not central to how your platform operates, you might be able to get away with Postgres. I still advise using Redis as a reasonable hedge against someone famous tweeting organically about your service because in this case, you don't want your DB to go down because some queue table had a surplus of transactions (or vice versa).

Not to mention, at an early stage, your tech decisions set precedents for the team. Maybe you have 10K users with a low qps but soon you are sending marketing emails to them and your system has periodic bursts of queue activity for all 10K users at once. When discussing this marketing "feature" rarely anyone thinks, "Hey we can't do that with our Postgres queue", rather "Yeah I saw functions in our codebase for queuing---this is doable". This is a small effort but a huge technical investment for later on.

replies(1): >>rkk3+rP1

>>skytre+s41
> Now this is just a queue service with extra steps running in a relational DB instead of natively as an OS process. You did cite it as just an option but I don't see why this is an attractive option.

Your DB then shares a WAL log with your queue. Meaning a single managed physical replication pipeline for them both. Meaning only one set of leader-election issues to debug, not two. Meaning one canonical way to do geographic high-latency async replication. Meaning disaster recovery brings back a whole-system consistent snapshot state. Etc.

Honestly, if I had my way, every stateful component in the stack would all share a single WAL log. That’s what FoundationDB and the like get you.

> In fact I would argue running a DB (queue or no queue) is just more complex than running a queue service.

Well, yeah, but you usually need a DB. So, if you’re going to be paying the OpEx costs of the DB either way, then you may as well understand it deeply in order to wring the most productive use you can out of each OpEx dollar/man-hour spent.

(I feel the same way about Redis, as it happens: if you need it, and are locking your code into its model anyway, then you may as well take advantage of its more arcane features, like Redis Streams, Lua scripting, etc.)

However, maybe our company is uncommon in how much our service literally is doing fancy complex DB queries that use tons of DB features. We’re a data analytics company. Even the frontend people know arcane SQL here :)

> that ties you in with that library

The difference between what you / apps / abstracting libraries do in Redis, and what they do in an SQL DB, is that in the DB, the shape of everything has to be explained in a vendor-neutral manner: SQL DDL.

Sometimes Redis-based solutions converge on conventional schemas; see e.g. Sidekiq’s informal schema, which several other queuing systems are implemented in terms of. But when they don’t, there’s nothing you can really do — beyond hacking on the libraries involved — to bring them into sync.

In an SQL DB, anything can be adapted into the expected shape of anything else, by defining SQL views. (Heck, in an SQL DB with Redis support, like Postgres with redis_fdw, the Redis data can be adapted into any shape you like using SQL views.)

And that’s further enabled by the fact that the DB had received from the app, through DDL, a schema, that you can examine, manipulate, and refactor; or even synthesize together with other schemas.

> you can't be assured that future versions of your DB would support those features

You can if those features are in the SQL standard. I’ve never heard of a DBMS regressing on its level of SQL standard support.

replies(1): >>skytre+Qf1

>>mikekl+S61
I assume a key-value store.

replies(1): >>hughrr+EY1

>>Jarwai+D71
You might want to look at something like SWF or Temporal

>>derefr+1b1
> Your DB then shares a WAL log with your queue...

I feel like the understanding we can come to here is that we have differing definitions of necessary complexity.

You did mention you work in data analytics and I have worked and am working in more traditional product-oriented gigs. Everything you mentioned are nice, impressive even, but to me they are toys. I don't need a queue with a WAL stream, or PITR backups. Queues, to me, are inter-process communication mechanisms, or a means to concurrency. In fact, worst case, you can delete the queues or restart Redis even without triggering a disk write (though I note this is another feature that comes free with Redis); it would inconvenience our customers but they can always just retry.

Of all the benefits you mentioned, leader-election is the only one I could make a good case for.

> then you may as well understand it deeply in order to wring the most productive use you can out of each OpEx dollar/man-hour spent.

Understanding it is one thing but using that understanding to create/develop (CapEx) and then maintain (OpEx) an in-DB queue feels like unnecessary costs. CapEx to install Redis is practically nil, and you seem to agree that running it is far simpler OpEx than PG too ("Well, yeah, but...").

Of course, I keep in mind your point about libraries doing the dirty work. But if it all boils down to a third party in the end, the point I'm replying to above is rather moot no? And if it comes to a third dependency anyway, I've already reasoned out my preference for an actual queue solution.

> explained in a vendor-neutral manner: SQL DDL

> You can if those features are in the SQL standard. I’ve never heard of a DBMS regressing on its level of SQL standard support.

While this is an ideal case, my experience so far shows that every major DB package relies on their own tweaks to the SQL standard, and so "If those features are in the SQL standard" turns out to be a pretty big "if" to ask. I don't worry about vendors regressing on SQL standard support but rather that the libraries for DB queues are relying on the non-SQL-standard features/behavior to mock out a full queue functionality---non standard behavior that has no guarantee of consistency across versions.

I mean, if there's a DB queue middleware/library that works across DB vendors, be it Postgres, Oracle, Maria, MySQL (with a reasonable storage engine, perhaps) then that's a convincing argument that the SQL standard can support queuing sufficiently. But otherwise, this counterpoint is a hard sell for me.

replies(1): >>derefr+0m1

>>skytre+Qf1
> it would inconvenience our customers but they can always just retry.

We have very different things in our queues :) An MQ, to me, is a place to put stuff that needs reliable, at-least-once delivery. Like, say, invoices. User never receives that, bad things happen.

There's very little difference between that type of data, and the data in your DB, while it's "active." It's only once the data in the queue is fully ACKed and no longer relevant that the data behaves more ephemerally than first-class DB state.

> and you seem to agree that running it is far simpler OpEx than PG too

I did not say that. In our stack, we run a Redis cluster on top of Kubernetes, because it's nominally semi-stateless. In reality, it's our biggest SPOF. It fights the container daemon around memory allocations (why does a Redis container that uses 2GB memory steady-state need a 64GB memory limit to finish starting up!?) and has weird dynamics around startup readiness (i.e. Redis started from a large AOF-with-RDB-prolog file will start serving requests for just a moment, then immediately stop again to finish loading. This confuses k8s into thinking it's done with startup-probes and is ready for readiness-probes. Which it's not.)

Redis, when configured to be both Highly Available and durable (which is not your use-case, I know) is really just less-robust than a DB, with less management tooling, fewer docs, fewer people who know how to scale it, fewer third-party solutions pre-adapted to expect it, etc. Because, to paraphrase: Redis isn't really in the business of being a database. And yet that's exactly what you end up with, when you try to use Redis as a reliable at-least-once MQ: a database. Just, a kinda sucky one. (One that doesn't even have a concept of users with different privileges; let alone RBAC!)

And yet, that actually is the niche of Redis. It's the durable, client-server cousin to in-process peer-mesh state-sync solutions like Hazelcast/Mnesia. Which makes it kind of weird that it's not particularly tuned at solving the problems in the one niche unique to it.

(I have nothing against Redis! It has non-unique use-cases where it works great, e.g. when it's being used as "memcached with more data structures", or as a coordination/locking server instead of Zookeeper, etc. A robust reliable MQ it is not. My argument is that your average RDBMS is closer to being a robust, reliable MQ than Redis is. Though, of course, you'll get the best robustness+reliability for that use-case out of a "real" MQ — Kafka or NATS or something.)

> but rather that the libraries for DB queues are relying on the non-SQL-standard features/behavior to mock out a full queue functionality

They don't have to, because everything you need to implement reliable queuing semantics is already part of [the lowest-common-denominator part of the] SQL standard [that everything that calls itself an RDBMS implements.]

These libs have DB-specific adapters for two reasons:

1. There's no such wire protocol as "SQL" — every RDBMS speaks its own wire protocol to convey your SQL statements to the database and convey result-sets back; and so, unless you're using a language with a common API abstraction for RDBMS client libs to be built against (e.g. JDBC), then each such client lib ends up presenting its own distinct API to callers, which needs a separate adapter written for it.

2. with SQL, reliable queuing is "easy", but knowing about enqueued items (with as minimal an overhead as possible, scanning as few additional items as possible) is hard. SQL says nothing about how to get push-notification-to-the-client type information out of the DB, and so every DBMS implements its own mechanism for this. These are pure optimizations using pure-ephemeral code, though; the schema of the queue in the DB remains the same either way, such that the data could be exported from one RDBMS to a .sql file, imported from that .sql file to a different RDBMS, and the only thing needed on the client side would be switching adapters.

(In theory, presuming a common wire-protocol abstraction interface lib like JDBC, you could have a "standard SQL" adapter. It'd just be annoyingly low-throughput. Like using HTTP polling when you could be using WebSockets.)

>>crazyg+2I
I believe most people simply have no concept of how performant modern RDBMS are. "Joins are slow" and all that.

>>atombe+M61
I think you'd still desire loose ordering even if you didn't care about strict ordering. If you always have 50 items in the queue you could be blowing the latency budget on random items because they are getting passed over where as with some form of loose chronological order you could still be completing everything within a certain amount.

SQS without FIFO works about like this AFAIK.

I agree with the gist of everything you are saying though, particularly the point about signaling. As soon as you want to start keeping track of a process that state is going to be stored somewhere outside the queues.

>>Jarwai+D71
> Do I want to separate the business entity "job" that maintains a status and updates and such from the "thing that dispatches jobs to workers"? And when a worker Has the job it just updates the business entity in the database?

If you ever really need to that's one way to do it but consider I'm imagining 50-100k state updates per second as being reasonable depending on how the data and queries are laid out. You can always use NOTIFY as discussed to cut down on the amount of worker queries for new work before having to completely push signaling to a separate system(which would likely still require occasional polling and/or a background process to re-signal work that hasn't been picked up).

It's interesting to consider that AWS Step Functions charges(or did charge) by the state transition.

>>skytre+2Q
We needed queues. We used Redis. That fits the domain.

Problem was there wasn't a good answer to "How much redis does your team need to know to put it in production".

We thought we knew it well enough, we thought we knew what we were getting into, and we thought so many others are using it for this, we should be good. That is makes a difference, clearly.

>>conrad+vY
Marko Kreen? He still maintains it so I guess some people still use it and the associated logical table distribution system.

>>skytre+f7
> Whatever happened to use the proper data structures for the job? PostgreSQL and MySQL are, at the end of the day, b-trees with indices. Throw in relational properties and/or ACID too. Those aren't properties you need or want in a queue structure.

How is relational storage engine with support for transactions, document/json, ACID, hot-standby "the wrong data model" for a queue?

>>politi+5L
If Redis is your primary datastore then sure. But the problem domain for which "Redis is your primary datastore" is appropriate is relatively narrow.

>>crazyg+2I
I very strongly agree to the idea of using the tools you already have and know to solve the problem at hand and release it. Then observe where it could use help, then seek options on specific products solving those problems and only then can we get to real good questions that can pierce the veil of marketing and the comfort of herd mentality.

>>skytre+k81
> I still advise using Redis as a reasonable hedge against someone famous tweeting

Early stage startups die because of lack of PMF. Diverting focus and resources away from finding PMF kills companies. Most companies should focus on the product, tech debt be damned.

>>blowsk+0f
This. So much this.

>>hughrr+(OP)
One of my businesses is running a software solution build in 2012. Ever since it is using queuing via a database. It's simple, easy to maintain. And scales really well. We are talking about 10k queues per second. Yes it disattracts developers like you. They usually prefer the latest fancy trending tool. luckily those types are often responsible for overengineering things and overly complex solutions.

>>yawara+Gc1
Correct

replies(1): >>mikekl+Vu6

>>hughrr+qo
Not really knowing the details of Redis. What makes it a better key-value store than a Postgres table with a key-column and a value-column?

>>rantwa+r01
The blog post you mentions brings two arguments: first the database would require looking. Second the database would need tradeoffs between reading (polling) and writing (adding and removing to the queue).

The original article handles the first argument: Postgres doesn't need polling. Instead it provides a notify mechanism, that informs the application when the table changed (something was added or removed from the queue) via the SQL NOTIFY statement.

For the second point it also provides a solution: since optimization for reading is not needed anymore with the NOTIFY statement, the trade-off like different: we now need an efficient way to write. For this the article provides an efficient update statement with special lock behavior. This helps to make writes efficient, too.

It looks like both points from the blog post you linked are handled in the original article.

>>hughrr+EY1
Perhaps I misinterpreted this part?

> I struggle to consider using it as a data store these days as well with other options on the table.

It sounds like you wouldn't use it for any purposes.

>>aidos+Rf
If you have the case, if A then do B and C. You can commit to a topic and let B and C be triggered separately. This does not solve cases like storing to a db and then send a db-updated event.