zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. _ugfj+z2[view] [source] 2021-06-12 07:29:54
>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

◧◩
2. hughrr+gc[view] [source] 2021-06-12 09:18:43
>>_ugfj+z2
Oh please stop using databases as queues. I spent a disproportionate amount of time in the last 20 years undoing that decision. It doesn’t scale at all well.
◧◩◪
3. cerved+vn[view] [source] 2021-06-12 11:28:11
>>hughrr+gc
In your experience, roughly at which point did it stop scaling and why?
◧◩◪◨
4. rmetzl+Zq[view] [source] 2021-06-12 12:13:43
>>cerved+vn
I also have a lot of issues with people using tables and cronjobs instead of just using queues.

I once was the one implementing this stupid idea. It's very easy to shoot yourself in the foot. It seems to be ok at first, but it is a stupid idea and will give nightmares to the operations team.

It works as long as the table is small. It gets especially problematic when there are sudden spikes in the workload, so that the table is growing rapidly. People don't test these scenarios enough.

Often it's important to do the jobs in the order they came in. This means you need to have the index set up this way or you'll end up with full table scans. You also need to remove jobs which are done, but this will also mess with the tables performance.

There are tricks with databases to help scaling this, but at the end of the day it's much simpler to just use a queue.

[go to top]