zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. _ugfj+z2[view] [source] 2021-06-12 07:29:54
>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

◧◩
2. Twisol+Xa[view] [source] 2021-06-12 09:02:38
>>_ugfj+z2
For other readers, the `UPDATE` step is an exact anlogue of the "compare-and-set" atomic instruction [0]. It's really cool to see how you've realized it in SQL!

As a capability, compare-and-swap has an infinite consensus number [1], meaning it's sufficient to implement wait-free consensus algorithms with an arbitrary number of participants. That makes it a perfect fit for managing a scalable pool of workers that need to coordinate on consuming from a queue!

[0] https://en.wikipedia.org/wiki/Compare-and-swap

[1] https://en.wikipedia.org/wiki/Consensus_(computer_science)#C...

◧◩◪
3. halifa+Il[view] [source] 2021-06-12 11:05:50
>>Twisol+Xa
Postgres’s transactional semantics are really useful when building a queue, because of how it interacts with the various pieces.

Connection 1

  LISTEN 'job-updates';
Connection 2

  BEGIN;
  INSERT INTO jobs ('a-uuid', …);
  SELECT PG_NOTIFY('job-update', 'json blob containing uuid and state change info');
  COMMIT;
Connection 3 (used when Connection 1 is notified)

  BEGIN;
  SELECT id, … FROM jobs WHERE id = 'a-uuid' FOR UPDATE SKIP LOCKED;
  UPDATE 'jobs' SET state = 'step1_completed' WHERE is = 'a-uuid';
  SELECT PG_NOTIFY('job-update', 'json blob containing uuid and state change info');
  -- do the thing here: computation, calling external API, etc. If it fails then rollback.
  COMMIT;
Because notify has transactional semantics, the notify only goes out at transaction commit time. You want to use a dedicated connection for the notify.

The only downsides I immediately think of are you will have every worker contending to lock that row, and you’ll need to write periodic jobs to cleanup/retry failures.

[go to top]