zlacker

[parent] [thread] 3 comments
1. michae+(OP)[view] [source] 2024-06-27 15:13:24
can you run the whole task as a postgres transaction? like if i want to make an idempotent job by only updating some status to "complete" once the job finishes.
replies(3): >>abelan+t1 >>teaear+V1 >>mind-b+vv
2. abelan+t1[view] [source] 2024-06-27 15:22:37
>>michae+(OP)
No, the whole task doesn't execute as a postgres transaction. Transactions will update the status of a task (and higher-order concepts like workflows) and assign/unassign work to workers, but they're short-lived by design.

For some more detail -- to ensure we can't assign duplicate work, we track which workers are assigned to jobs by using the concept of a WorkerSemaphore, where each worker slot is backed by a row in the WorkerSemaphore table. When assigning tasks, we scan the WorkerSemaphore table and use `FOR UPDATE SKIP LOCKED` to skip any locked rows help by other assignment transactions. We also have a uniqueness constraint on the task id across all WorkerSemaphores to ensure that no more than 1 task can be acquired by a semaphore.

This is slightly different to the way most pg-backed queues work, where `FOR UPDATE SKIP LOCKED` is done on the task level, but this is because not every worker maintains its own connection to the database in Hatchet, so we use this pattern to assign tasks across multiple workers and route the task via gRPC to the correct worker after the transaction completes.

3. teaear+V1[view] [source] 2024-06-27 15:25:08
>>michae+(OP)
Not a Hatchet user, but this doesn’t sound like a Hatchet-specific question. Long running transactions could be problematic depending on the details. I handle idempotency by not holding a transaction and instead only upserting records in jobs and using the job record itself to get the status. For example, if you want to know if a PDF has had all of its pages OCR’d, look at all of the job records for the PDF and aggregate them by status. If they’re all complete you’re good to go.
4. mind-b+vv[view] [source] 2024-06-27 18:16:05
>>michae+(OP)
Long running transactions can easily lock up your database. I'd definitely avoid those. You're better off writing status records to the DB and using those to determine whether something is running, failing, etc.
[go to top]