zlacker

What a strange design. If a job is dependent on an extant transaction then perhaps the job should run in the same code that initiated the transaction instead of a outside job queue?

Also you pass the data a job needs to run as part of the job payload. Then you don't have the "data doesn't exist" issue.

replies(5): >>zackki+B1 >>brandu+L1 >>terafl+54 >>qaq+O7 >>maherb+Ji

>>hipade+(OP)
I agree. This design is incredibly strange, and seems to throw away basically all distributed systems knowledge. I'm glad folks are playing with different ideas, but this one seems off.

replies(1): >>eximiu+64

>>hipade+(OP)
Author here.

Wanting to offload heavy work to a background job is absolute as old of a best practice as exists in modern software engineering.

This is especially important for the kind of API and/or web development that a large number of people on this site are involved in. By offloading expensive work, you take that work out-of-band of the request that generated it, making that request faster and providing a far superior user experience.

Example: User sign-up where you want to send a verification email. Talking to a foreign API like Mailgun might be a 100 ms to multisecond (worst case scenario) operation — why make the user wait on that? Instead, send it to the background, and give them a tight < 100 ms sign up experience that's so fast that for all intents and purposes, it feels instant.

replies(2): >>stouse+Z2 >>hipade+66

>>brandu+L1
GP isn’t taking umbrage with the concept of needing to offload work to a background process.

>>hipade+(OP)
It's not strange at all to me. The job is "transactional" in the sense that it depends on the transaction, and should be triggered iff the transaction commits. That doesn't mean it should run inside the transaction (especially since long-running transactions are terrible for performance).

Passing around the job's data separately means that now you're storing two copies, which means you're creating a point where things can get out of sync.

replies(1): >>hipade+q7

>>zackki+B1
No, this is a fairly common pattern called having an 'outbox' where the emission/enquing of your event/message/job is tied to the transaction completion of the relevant domain data.

We use this to ensure Kafka events are only emitted when a process succeeds, this is very similar.

replies(1): >>iskela+BH

>>brandu+L1
> Wanting to offload heavy work to a background job is absolute as old of a best practice as exists in modern software engineering.

Yes. I am intimately familiar with background jobs. In fact I've been using them long enough to know, without hesitation, that you don't use a relational database as your job queue.

replies(3): >>toolz+P7 >>qaq+q9 >>lazyan+Qr

>>terafl+54
> should be triggered iff the transaction commits

Agreed. Which is why the design doesn't make any sense. Because in the scenario presented they're starting a job during a transaction.

replies(4): >>j45+Wm >>terafl+Zp >>Chris9+6q >>eximiu+Gs

>>hipade+(OP)
Job is not dependent on extant transaction. The bookkeeping of job state runs in the same transaction as your domain state manipulation so you will never get into situation where job domain mutation commited but job state failed to update to complete.

>>hipade+66
as far as I'm aware the most popular job queue library in elixir depends on postgres and has performance characteristics that cover the vast majority of background processing needs I've come across.

I wonder maybe if you've limited yourself by assuming relational DBs only have features for relational data. That isn't the case now and really hasn't been the case for quite some time now.

>>hipade+66
Postgres based job queues work fine if you have say 10K transaction per second and jobs on average do not take significant time to complete (things will run fine on fairly modest instance). They also give guarantees that traditional job queues do not.

replies(1): >>Rapzid+8G2

>>hipade+(OP)
I think you may be misunderstanding the design here. The transaction for initiating the job is only for queuing. The dequeue and execution of the job happens in a separate process.

The example on the home page makes this clear where a user is created and a job is created at the same time. This ensures that the job is queued up with the user creation. If any parts of that initial transaction fails, then the job queuing doesn't actually happen.

>>hipade+q7
Maybe it’s not designed for that or all use cases and that can make sense.

Personally, I need long running jobs.

>>hipade+q7
I don't understand what you mean. The job is "created" as part of the transaction, so it only becomes visible (and hence eligible to be executed) when the transaction commits.

>>hipade+q7
The job is queued as part of the transaction. It is executed by a worker outside the scope of the transaction.

>>hipade+66
> I've been using them long enough to know, without hesitation, that you don't use a relational database as your job queue.

I'm also very familiar with jobs and I have used the usual tools like Redis and RMQ, but I wouldn't make a blanket statement like that. There are people using RDBS as queues in prod so we have some counter-examples. I wouldn't mind at all to get rid of another system (not just one server but the cluster of RMQ/Redis you need for HA). If there's a big risk in using pg as backend for a task queue, I'm all ears.

>>hipade+q7
That part is somewhat poorly explained. That is a motivating example of why having your job queue system be separate from your system of record can be bad.

e.g.,

1. Application starts transaction 2. Application updates DB state (business details) 3. Application enqueues job in Redis 4. Redis jobworkers pick up job 5. Redis jobworkers error out 6. Application commits transaction

This motivates placing the jobworker state in the same transaction whereas non-DB based job queues have issues like this.

>>eximiu+64
So when the the business data transaction commit a notify event is raised and a job row is inserted. Out of bound job broker listens to a notify event of the job-table or polls the table skipping rows and takes work for processing?

replies(2): >>eximiu+3m1 >>youerb+Cn1

>>iskela+BH
Basically.

For our particular use case, I think we're actually not using notify events. We just insert rows into the outbox table and the poller re-emits as kafka events and deletes successfully emitted events from the table.

>>iskela+BH
Either business data and job are committed or none of them. Then as you write, either polling or listening to an even worker, can pick it up. Bonus stuff, from implementation perspective, is that if worker selects row FOR UPDATE (locking the job from others to pick up) and dies, Postgres will release the lock after some time, making the job available for other workers.

>>qaq+q9
Probably order of magnitude more or perhaps a multiple of that depending on the hardware and design.

In theory an append-only and/or HOT strategy leaning on Postgres just ripping through moderate sized in-mem lists could be incredibly fast. Design would be more complicated and perhaps use case dependent but I bet could be done.

replies(1): >>qaq+vw3

>>Rapzid+8G2
Yep that's why I specifically mentioned "fairly modest instance" on reasonably fast box you can get magnitude more. You can partition the tasks table to reduce the number of rows skip locked has to run through to grab next task.