On SQS - zlacker

>>mpweih+(OP)
I use Postgres SKIP LOCKED as a queue.

I used to use SQS but Postgres gives me everything I want. I can also do priority queueing and sorting.

I gave up on SQS when it couldn't be accessed from a VPC. AWS might have fixed that now.

All the other queueing mechanisms I investigated were dramatically more complex and heavyweight than Postgres SKIP LOCKED.

>>andrew+7a
Using SKIP LOCKED - do you commit the change to the dequeued item (ack it) at the point where you exit the DB call. If so what happens if the instance that dequeued the messages crashes?

>>soroko+Ll
Not GP but I think this wouldn't be a problem. The consumer dequeues with SELECT FOR UPDATE within a transaction. If it crashes the database would rollback the transaction, and then another consumer would be able to select the work unit.

As for acking, I see two common methods: using an additional boolean column, something like is_processed. Consumers skip truthy ones. Or, after the work is done, simply delete the entry or move elsewhere (e.g. For archival / auditing).

>>napste+ly
My question assumed a scenario where a consumer dequeues a batch, commits the deqieued change, and then crashes while processing the batch.

Offcourse one could delay the commit until all processing is completed but then reasoning about the queue throughput becomes tricky.

>>soroko+xy
That's the challenge of distributed systems :) it really boils down to how you want failures to be handled.

If you ack before processing, and then you crash, those messages are lost (assuming you can't recover from the crash and you are not using something like a two-phase commit).

If you ack after processing, you may fail after the messages have been processed but before you've been able to ack them. This leads to duplicates, in which case you better hope your work units are idempotent. If they are not, you can always keep a separate table of message IDs that have been processed, and check against it.

Either way, it's hard, complex and there are thousands of intermediate failure cases you have to think about. And for each possible solution (2pc, separate table of message IDs for idempotency, etc) you bring more complexity and problems to the table.

>>napste+6A
Well, sqs has machinery that deals with this (in flight messages, visibility timeouts) "out of the box". Similar functionality needs to be handcrafted when using dB as a queue.

To be clear, it is not that the SKIP LOCKED solution is invalid, it is just that there are scenarios where it is not sufficient.