On SQS - zlacker

>>mpweih+(OP)
I use Postgres SKIP LOCKED as a queue.

I used to use SQS but Postgres gives me everything I want. I can also do priority queueing and sorting.

I gave up on SQS when it couldn't be accessed from a VPC. AWS might have fixed that now.

All the other queueing mechanisms I investigated were dramatically more complex and heavyweight than Postgres SKIP LOCKED.

>>andrew+7a
Using SKIP LOCKED - do you commit the change to the dequeued item (ack it) at the point where you exit the DB call. If so what happens if the instance that dequeued the messages crashes?

>>soroko+Ll
Not GP but I think this wouldn't be a problem. The consumer dequeues with SELECT FOR UPDATE within a transaction. If it crashes the database would rollback the transaction, and then another consumer would be able to select the work unit.

As for acking, I see two common methods: using an additional boolean column, something like is_processed. Consumers skip truthy ones. Or, after the work is done, simply delete the entry or move elsewhere (e.g. For archival / auditing).

>>napste+ly
My question assumed a scenario where a consumer dequeues a batch, commits the deqieued change, and then crashes while processing the batch.

Offcourse one could delay the commit until all processing is completed but then reasoning about the queue throughput becomes tricky.

>>soroko+xy
You'd have the same problem with SQS, wouldn't you. The act of dequeueing does not guarantee that the process that received a message will not fail to perform it.

If you want a reliable system along those lines than you need to use SKIP LOCKED to SELECT one row to lock, then process it, and then DELETE the row. If your process dies then the lock will be release. You still have a new flavor of the same problem: you might process a message twice because the process might die in between completing processing and deleting the row. You could add complexity: first use SKIP LOCKED to SELECT one row to UPDATE to mark in-progress and LOCK the row, then later if the process dies another can go check if the job was performed (then clean the garbage) or not (pick and perform the job) -- a two-phase commit, essentially.

Factor out PG, and you'll see that the problem similar no matter the implementation.

>>crypto+tA1
> you might process a message twice because the process might die in between completing processing and deleting the row

The very handy thing about the setup described, is that your data tables are part of the same MVCC world-state as your message queue. So you do all the work for the job, in the context of the same MVCC transaction that is holding the job locked; and anything that causes the job to fail, will fail the entire transaction, and thus rollback any changes that the job's operation made to the data.