zlacker

That's the challenge of distributed systems :) it really boils down to how you want failures to be handled.

If you ack before processing, and then you crash, those messages are lost (assuming you can't recover from the crash and you are not using something like a two-phase commit).

If you ack after processing, you may fail after the messages have been processed but before you've been able to ack them. This leads to duplicates, in which case you better hope your work units are idempotent. If they are not, you can always keep a separate table of message IDs that have been processed, and check against it.

Either way, it's hard, complex and there are thousands of intermediate failure cases you have to think about. And for each possible solution (2pc, separate table of message IDs for idempotency, etc) you bring more complexity and problems to the table.

replies(1): >>soroko+D3

>>napste+(OP)
Well, sqs has machinery that deals with this (in flight messages, visibility timeouts) "out of the box". Similar functionality needs to be handcrafted when using dB as a queue.

To be clear, it is not that the SKIP LOCKED solution is invalid, it is just that there are scenarios where it is not sufficient.

replies(1): >>andrew+Z5

>>soroko+D3
Refer to the example I posted as a reply, above.