zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. _ugfj+z2[view] [source] 2021-06-12 07:29:54
>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

◧◩
2. hughrr+gc[view] [source] 2021-06-12 09:18:43
>>_ugfj+z2
Oh please stop using databases as queues. I spent a disproportionate amount of time in the last 20 years undoing that decision. It doesn’t scale at all well.
◧◩◪
3. atombe+fB[view] [source] 2021-06-12 13:58:55
>>hughrr+gc
You're not wrong, but every solution involves compromises. More broadly, I would argue, though, that almost nobody actually needs queues.

Sure, there are cases were you actually need strict ordering, but they are relatively few, in my opinion, involving things like transaction processing or other areas where the order of operations change the effect of the operations and must be strictly sequential.

Ordering itself brings significant technical challenges. For example: With a queue, you can only have strict ordering as long as you only process one item as a time, with no concurrency. Ordering also complicates handling of failures and retrying, because it means every newer item has to wait until the failed item has been dealt with; if you use things like dead letter queues, order is violated.

In almost all use cases -- anything from image processing to newsletter delivery -- a better system is to have an unordered list of items that anyone can take items out of, in any order. Then you throw as many workers at this system and design it so that there's minimal locking needed to prevent workers from stepping on each other's.

There are queue systems that have a purpose in a stack, specifically pub/sub brokers, which can be used to schedule work along several dimensions. You use the pub/sub system to signal work to be done, but each payload just refers to some state that's stored elsewhere.

◧◩◪◨
4. hughrr+NC[view] [source] 2021-06-12 14:14:21
>>atombe+fB
I think you're missing something important here by applying the worker pool simplification to the idea. Order is important i that case as well. It forces both fair scheduling of the work and the metrics from the queue can easily describe the overall latency of the work being done and the capacity and utilisation. The latter can be used to scale the consumers of the queue and/or alert on when things aren't working properly.

If you throw a proper messaging broker on the table like RabbitMQ then you're getting message acknowledgement as well which is important if you must do the work. A failed consumer will result in the work being rescheduled on another consumer transparently. Of course there are metrics you can see and get heads up when your consumers are not working.

Ergo, don't simplify something too soon. Even reliable delivery of newsletters is rather important otherwise why would all the tracking pixels exist? (they are a form of message acknowledgement)

[go to top]