Hey HN - this is Alexander and Gabe from Hatchet (https://hatchet.run). We’re building a modern task queue as an alternative to tools like Celery for Python and BullMQ for Node. Our open-source repo is at https://github.com/hatchet-dev/hatchet and is 100% MIT licensed.
When we did a Show HN a few months ago (https://news.ycombinator.com/item?id=39643136), our cloud version was invite-only and we were focused on our open-source offering.
Today we’re launching our self-serve cloud so that anyone can get started creating tasks on our platform - you can get started at https://cloud.onhatchet.run, or you can use these credentials to access a demo (should be prefilled):
URL: https://demo.hatchet-tools.com
Email: hacker@news.ycombinator.com
Password: HatchetDemo123!
People are currently using Hatchet for a bunch of use-cases: orchestrating RAG pipelines, queueing up user notifications, building agentic LLM workflows, or scheduling image generation tasks on GPUs.We built this out of frustration with existing tools and a conviction that PostgreSQL is the right choice for a task queue. Beyond the fact that many developers are already using Postgres in their stack, which makes it easier to self-host Hatchet, it’s also easier to model higher-order concepts in Postgres, like chains of tasks (which we call workflows). In our system, the acknowledgement of the task, the task result, and the updates to higher-order models are done as part of the same Postgres transaction, which significantly reduces the risk of data loss/race conditions when compared with other task queues (which usually pass acknowledgements through a broker, storing the task results elsewhere, and only then figuring out the next task in the chain).
We also became increasingly frustrated with tools like Celery and the challenges it introduces when using a modern Python stack (> 3.5). We wrote up a list of these frustrations here: https://docs.hatchet.run/blog/problems-with-celery.
Since our Show HN, we’ve (partially or completely) addressed the most common pieces of feedback from the post, which we’ll outline here:
1. The most common ask was built-in support for fanout workflows — one task which triggers an arbitrary number of child tasks to run in parallel. We previously only had support for DAG executions. We generalized this concept and launched child workflows (https://docs.hatchet.run/home/features/child-workflows). This is the first step towards a developer-friendly model of durable execution.
2. Support for HTTP-based triggers — we’ve built out support for webhook workers (https://docs.hatchet.run/home/features/webhooks), which allow you to trigger any workflow over an HTTP webhook. This is particularly useful for apps on Vercel, who are dealing with timeout limits of 60s, 300s, or 900s (depending on your tier).
3. Our RabbitMQ dependency — while we haven’t gotten rid of this completely, we’ve recently launched hatchet-lite (https://docs.hatchet.run/self-hosting/hatchet-lite), which allows you to run the various Hatchet components in a single Docker image that bundles RabbitMQ along with a migration process, admin CLI, our REST API, and our gRPC engine. Hopefully the lite was a giveaway, but this is meant for local development and low-volume processing, on the order of hundreds per minute.
We’ve also launched more features, like support for global rate limiting, steps which only run on workflow failure, and custom event streaming.
We’ll be here the whole day for questions and feedback, and look forward to hearing your thoughts!
We're eventually going to support a lightweight Postgres-backed messaging table, but the number of pub/sub messages sent through RabbitMQ is typically an order of magnitude higher than the number of tasks sent.
Part of the reason for working on Hatchet (this version) was that I built the Terraform management tool on top of Temporal and felt there was room for improvement.
(for those curious - https://github.com/hatchet-dev/hatchet-v1-archived)
I know a lot of folks are going after the AI agent workflow orchestration platform, do you see yourselves progressing there?
In my head, Hatchet coupled with BAML (https://www.boundaryml.com/) could be an incredible combination to support these AI agents. Congrats on the launch
Also, somewhat related, years ago I wrote a very small framework for fan-out of Django-based tasks in Celery. We have been running it in production for years. It doesn't have adoption beyond our company, but I think there are some good ideas in it. Feel free to take a look if it's of interest! https://github.com/groveco/django-sprinklers
To that end, we’re building Hatchet to orchestrate agents with features that are common like streaming from running workers to frontend [1] and rate limiting [2] without imposing too many opinions on core application logic.
[1] https://docs.hatchet.run/home/features/streaming [2] https://docs.hatchet.run/home/features/rate-limits
Yep, we agree - this is more a matter of bandwidth as well as figuring out the final definition of the pub/sub interface. While we wouldn't prefer to maintain two message queue implementations, we likely won't drop the RabbitMQ implementation entirely, even if we offer Postgres as an alternative. So if we do need to support two implementations, we'd prefer to build out a core set of features that we're happy with first. That said, the message queue API is definitely stabilizing (https://github.com/hatchet-dev/hatchet/blob/31cf5be248ff9ed7...), so I hope we can pick this up in the coming months.
> You also compare yourself against Celery and BullMQ, but there is also talk in the readme around durable execution. That to me puts you in the realm of Temporal. How would you say you compare/compete with Temporal? Are you looking to compete with them?
Yes, our child workflows feature is an alternative to Temporal which lets you execute Temporal-like workflows. These are durable from the perspective of the parent step which executes them, as any events generated by the child workflows get replayed if the parent step re-executes. Non-parent steps are the equivalent of a Temporal activity, while parent steps are the equivalent of a Temporal workflow.
Our longer-term goal is to build a better developer experience than Temporal, centered around observability and worker management. On the observability side, we're investing heavily in our dashboard, eventing, alerting and logging features. On the worker management side, we'd love to integrate more natively with worker runtime environments to handle use-cases like autoscaling.
1. Commit changes in the db first: if you fail to enqueue the task, there will be data rows hanging in the db but no task to process them
2. Push the task first: the task may kick start too early, and the DB transaction is not committed yet, it cannot find the rows still in transaction. You will need to retry failure
We also looked at Celery and hope it can provide a similar offer, but the issue seems open for years:
https://github.com/celery/celery/issues/5149
With the needs, I build a simple Python library on top of SQLAlchemy:
https://github.com/LaunchPlatform/bq
It would be super cool if Hatchet also supports native SQL inserts with ORM frameworks. Without the ability to commit tasks with all other data rows, I think it's missing out a bit of the benefit of using a database as the worker queue backend.
* Inngest is fully event driven, with replays, fan-outs, `step.waitForEvent` to automatically pause and resume durable functions when specific events are received, declarative cancellation based off of events, etc.
* We have real-time metrics, tracing, etc. out of the box in our UI
* Out of the box support for TS, Python, Golang, Java. We're also interchangeable with zero-downtime language and cloud migrations
* I don't know Hatchet's local dev story, but it's a one-liner for us
* Batching, to turn eg. 100 events into a single execution
* Concurrency, throttling, rate limiting, and debouncing, built in and operate at a function level
* Support for your own multi-tenancy keys, allowing you to create queues and set concurrency limits for your own concurrency
* Works serverless, servers, or anywhere
* And, specifically, it's all procedural and doesn't have to be a DAG.
We've also invested heavily in flow control — the aspects of batching, concurrency, custom multi-tenancy controls, etc. are all things that you have to layer over other systems.
I expect because we've been around for a couple of years that newer folks like Hatchet end up trying to replicate some of what we've done, though building this takes quite some time. Either way, happy to see our API and approach start to spread :)
1. Hatchet is MIT licensed and designed to be self-hosted in production, with cloud as an alternative. While the Inngest dev server is open source, it doesn't support self-hosting: https://www.inngest.com/docs/self-hosting.
2. Inngest is built on an HTTP webhook model while Hatchet is built on a long-lived, client-initiated gRPC connection. While we support HTTP webhooks for serverless environments, a core part of the Hatchet platform is built to display the health of a long-lived worker and provide worker-level metrics that can be used for autoscaling. All async runtimes that we've worked on in the past have eventually migrated off of serverless for a number of reasons, like reducing latency or having more control over things like runtime environment and DB connections. AFIAK the concept of a worker or worker health doesn't exist in Inngest.
There are the finer details which we can hash out in the other thread, but both products rely on events, tasks and durable workflows as core concepts, and there's a lot of overlap.
Hatchet is also event driven [1], has built-in support for tracing and metrics, and has a TS [2], Python [3] and Golang SDK [4], has support for throttling and rate limiting [5], concurrency with custom multi-tenancy keys [6], works on serverless [7], and supports procedural workflows [8].
That said, there are certainly lots of things to work on. Batching and better tracing are on our roadmap. And while we don’t have a Java SDK, we do have a Github discussion for future SDKs that you can vote on here: https://github.com/hatchet-dev/hatchet/discussions/436.
[1] https://docs.hatchet.run/home/features/triggering-runs/event...
[2] https://docs.hatchet.run/sdks/typescript-sdk
[3] https://docs.hatchet.run/sdks/python-sdk
[4] https://docs.hatchet.run/sdks/go-sdk
[5] https://docs.hatchet.run/home/features/rate-limits
[6] https://docs.hatchet.run/home/features/concurrency/round-rob...
https://x.com/mitchellh/status/1759626842817069290?s=46&t=57...
That's not to say you can't use Hatchet for data pipelines - this is a common use-case. But you probably don't want to use Hatchet for big data pipelines where payload sizes are very large and you're working with payloads that aren't JSON serializable.
Airflow also tends to be quite slow when the task itself is short-lived. We don't have benchmarks, but you can have a look at Windmill's benchmarks on this: https://www.windmill.dev/docs/misc/benchmarks/competitors#re....
I hated the configuration and management complexity of RabbitMQ and Celery and pretty much everything else.
My ultimate goal was to build a message queue that was extremely fast and required absolutely zero config and was HTTP based thus has no requirement for any specific client.
I developed one in Python that was pretty complete but slow, then developing a prototype in Rust that was extremely fast but incomplete.
The latest is sasquatch. Its written in golang, uses sqlite for the db and behaves in a very similar way to Amazon SQS in that connections are HTTP and it uses long polling to wait for messages.
https://github.com/crowdwave/sasquatch
Its only in the very early stages of development at this stage and likely isn't even compiling but most of the code is in place. I'm hoping to get around to next phase of development soon.
I just love the idea of a message queue that is a single static binary and when you run it, you have a fully functioning message queue with nothing more to do - not even fiddling with Postgres.
Absolute zero config, not minutes, hours or days of futzing with configs and blogs and tutorials.
- Blake, co-author of riverqueue.com / https://github.com/riverqueue/river :)
There's many great distributed job runners out there. I've never found one for go that lets me have the features without running 7 processes and message queues sprawled over hosts and docker containers.
jorb is just a framework to slap into a go script when you want to fire a lot of work at your computer and let it run it to completion.
I've tried to build this many times and this is the first time I've gotten it to stick.
Yes you can do this with core go primitives but I find this abstraction to be a lot better and (eventually) was easier to debug deadlocks.
I'm just putting it here cause it's semi related.
How? This issue still seems to be open after 6 years: https://github.com/celery/celery/issues/5149
https://github.com/rails/solid_queue
Just trying to understand. I do get that hatchet would be language agnostic, SDK API kind of a solution.
The point of Hatchet is to support more complex behavior - like chaining tasks together, building automation around querying and retrying failed tasks, handling a lot of the fairness and concurrency use-cases you'd otherwise need to build yourself, etc - or just getting something that works out of the box and can support those use-cases in the future.
And if you are running at low volume and trying to debug user issues, a grafana panel isn't going to get you the level of granularity or admin control you need to track down the errors in your methods (rather than just at the queue level). You'd need to integrate your task queue with Sentry and a logging system - and in our case, error tracing and logging are available in the Hatchet UI.
Configurable retry delays are currently in development.