Also you pass the data a job needs to run as part of the job payload. Then you don't have the "data doesn't exist" issue.
Wanting to offload heavy work to a background job is absolute as old of a best practice as exists in modern software engineering.
This is especially important for the kind of API and/or web development that a large number of people on this site are involved in. By offloading expensive work, you take that work out-of-band of the request that generated it, making that request faster and providing a far superior user experience.
Example: User sign-up where you want to send a verification email. Talking to a foreign API like Mailgun might be a 100 ms to multisecond (worst case scenario) operation — why make the user wait on that? Instead, send it to the background, and give them a tight < 100 ms sign up experience that's so fast that for all intents and purposes, it feels instant.
Passing around the job's data separately means that now you're storing two copies, which means you're creating a point where things can get out of sync.
We use this to ensure Kafka events are only emitted when a process succeeds, this is very similar.
Yes. I am intimately familiar with background jobs. In fact I've been using them long enough to know, without hesitation, that you don't use a relational database as your job queue.
Agreed. Which is why the design doesn't make any sense. Because in the scenario presented they're starting a job during a transaction.
I wonder maybe if you've limited yourself by assuming relational DBs only have features for relational data. That isn't the case now and really hasn't been the case for quite some time now.
The example on the home page makes this clear where a user is created and a job is created at the same time. This ensures that the job is queued up with the user creation. If any parts of that initial transaction fails, then the job queuing doesn't actually happen.
Personally, I need long running jobs.
I'm also very familiar with jobs and I have used the usual tools like Redis and RMQ, but I wouldn't make a blanket statement like that. There are people using RDBS as queues in prod so we have some counter-examples. I wouldn't mind at all to get rid of another system (not just one server but the cluster of RMQ/Redis you need for HA). If there's a big risk in using pg as backend for a task queue, I'm all ears.
e.g.,
1. Application starts transaction 2. Application updates DB state (business details) 3. Application enqueues job in Redis 4. Redis jobworkers pick up job 5. Redis jobworkers error out 6. Application commits transaction
This motivates placing the jobworker state in the same transaction whereas non-DB based job queues have issues like this.
For our particular use case, I think we're actually not using notify events. We just insert rows into the outbox table and the poller re-emits as kafka events and deletes successfully emitted events from the table.
In theory an append-only and/or HOT strategy leaning on Postgres just ripping through moderate sized in-mem lists could be incredibly fast. Design would be more complicated and perhaps use case dependent but I bet could be done.