I mitigate this in several ways, first, only one worker can work on a job or child job. At worst, one worker is stuck in a loop. There's also a maximum limit of retries. So a job is never infinitely retried. Increasing the amount of stall time per request would be trivial but it's never become an issue. Monitoring potential infinite jobs is as simple as querying for jobs that reach the limit.
I'm sure Erlang or other actor model type implementations would handle this incredibly well at scale but it seems to me that just doing in the database works good enough for at least certain workloads.
As always, it comes down to a question of the right tool for the job. The advantage of the database is that it's mostly likely already in the stack. I don't doubt that there are situations where it's the wrong tool. That's why it's interesting to know at which point it's not a good solution, so that one can make a good decision.