The dashboard in Hatchet has a great GUI where you can navigate between all the tasks, see how they all connect, see the data passed in to each one, see the return results from each task, and each one has a log box you can print information to. You can rerun tasks, override variables, trigger identical workflows, filter tasks by metadata
It's dramatically reduced the amount of time it takes me to spot, identify, and fix bugs. I miss the simplicity of SAQ but that's the reason I switched and it's paid off already
FWIW, Ive used both dramatiq/celery with redis in heavy prod environments and never had an issue with debugging. And Im having a tough time understanding how switching the underlying queue infrastructure would have made my life easier.
It's nice to be running on Postgres (i.e. not really having to worry about payload size, I heard some people were passing images from task to task) but for me that is just a nicety and wasn't a reason to switch.
If you're happy with your current infra, happy with the visibility, and there's nothing lacking in the development perspective, then yeah probably not much point in switching your infra to begin with [1]. But if you're building complicated workflows, and just want your code to run with an extreme level of visibility, it's worth checking out Hatchet.
[1] I'm sure the founders would have more to say here, but as a consumer I'm not really deep in the architecture of the product. Best I could do could be to give you 100 reasons I will never use Celery again XD
How? This issue still seems to be open after 6 years: https://github.com/celery/celery/issues/5149
The point of Hatchet is to support more complex behavior - like chaining tasks together, building automation around querying and retrying failed tasks, handling a lot of the fairness and concurrency use-cases you'd otherwise need to build yourself, etc - or just getting something that works out of the box and can support those use-cases in the future.
And if you are running at low volume and trying to debug user issues, a grafana panel isn't going to get you the level of granularity or admin control you need to track down the errors in your methods (rather than just at the queue level). You'd need to integrate your task queue with Sentry and a logging system - and in our case, error tracing and logging are available in the Hatchet UI.
(Wiring together 40+ preemptible TPUs was a nice crucible for learning about all of these. And much like a crucible, it was as painful as it sounds. Hatchet would’ve been nice.)
Thanks for making this!
Configurable retry delays are currently in development.