zlacker

[parent] [thread] 11 comments
1. matsem+(OP)[view] [source] 2023-09-24 20:52:21
No, just update set taken=1. If it was a change to the row, you updated it. If it wasn't, someone updated before you.

Our tasks were quick enough so that all fetched tasks would always be able to be completed before a scale down / new deploy etc, but we stopped fetching new ones when the signal came so it just finished what it had. I updated above, we did have logic to monitor if a task got taken but never got a finished status, but I can't remember it ever actually reporting on anything.

replies(3): >>fbdab1+u >>fsnipe+F2 >>SahAss+z3
2. fbdab1+u[view] [source] 2023-09-24 20:56:15
>>matsem+(OP)
I would set the taken field to a timestamp. Then you could have a cleanup job that looks for any lingering jobs aged past a reasonable timeout and null out the field.
replies(3): >>tylerg+m1 >>Izkata+mC >>magica+q01
◧◩
3. tylerg+m1[view] [source] [discussion] 2023-09-24 21:02:35
>>fbdab1+u
it wont work with a timestamp because each write will have an affected row of 1 beacuse the writes happen at different times. setting a boolean is static
replies(3): >>jayd16+T2 >>twic+F7 >>AdamJa+Yj
4. fsnipe+F2[view] [source] 2023-09-24 21:12:53
>>matsem+(OP)
You can combine this "update" with a "where taken = 0" to directly skip taken rows.
◧◩◪
5. jayd16+T2[view] [source] [discussion] 2023-09-24 21:15:05
>>tylerg+m1
You can do something like UPDATE row SET timeout = NOW() WHERE NOW() - taskTimeout > row.timestamp. You're not stuck with comparing bools.
6. SahAss+z3[view] [source] 2023-09-24 21:20:06
>>matsem+(OP)
That is the sort of thing that bites you hard when it bites. It might run perfectly for years but that one period of flappy downtime at a third party or slightly misconfigured DNS will bite you hard.
replies(1): >>matsem+Y6
◧◩
7. matsem+Y6[view] [source] [discussion] 2023-09-24 21:48:54
>>SahAss+z3
But compared to our rabbit setup where I work now, it was dead stable. No losing tasks or extra engineering effort on maintaining yet another piece of tech. Our rabbit cluster acting up has led to multiple disasters lately.
replies(1): >>SahAss+O8
◧◩◪
8. twic+F7[view] [source] [discussion] 2023-09-24 21:54:36
>>tylerg+m1
update tasks set taken_timestamp = now() where task_id = ? and taken_timestamp is null
◧◩◪
9. SahAss+O8[view] [source] [discussion] 2023-09-24 22:05:58
>>matsem+Y6
Agreed, I've had my own rabbit nightmares. But setting up a more robust queue on postgresql is easy, so you can easily gain a lot more guarantees without more complexity.
◧◩◪
10. AdamJa+Yj[view] [source] [discussion] 2023-09-25 00:01:48
>>tylerg+m1
update row set taken=true,taken_by=my_id,taken_at=now() where taken is false;
◧◩
11. Izkata+mC[view] [source] [discussion] 2023-09-25 03:40:24
>>fbdab1+u
We do it with two columns, one is an integer identifying which process took the job and the second is the timestamp for when it was taken.
◧◩
12. magica+q01[view] [source] [discussion] 2023-09-25 08:38:57
>>fbdab1+u
We have a "status flag" column which is either Available, Locked or Processed (A, L and P), an Updated column with a timestamp of when it was last updated, and a Version counter.

When grabbing a new message it selects "Available or (Locked with Updated timestamp older than configured timeout)". If successful it immediately tries to set the Locked status, Updated timestamp and bumps the Version counter, where the previous values of Status and Version has to match. If the update fails it retries getting a new message.

If the Version counter is too high, it moves the message to the associated dead-letter table, and retries getting a new message.

This isn't for high performance. I tested it and got 1000 messages/sec throughput with handful of producers and consumers against test db instance (limited hardware), which would be plenty for us.

I wrote it to be simple and so we could easily move to something AMPQ'ish like RabbitMQ or Azure Service Bus when needed. Overall quite easy to implement and has served us well so far.

[go to top]