NATS vs MQTT vs Kafka vs Redis Queue vs Amazon SQS - how do they all stack up?
Depending on the throughput you need, the number of clients pulling/putting into the queue, and other things you can rule out certain options. For example, Redis queue will be one of the fastest choices - but is very limited in capacity (memory size). SQS is basically unlimited capacity, but can often have 150ms or more of time elapse between when you put the item in and when it's available.
As far as ideal use cases, we use NATS for https://plane.dev in two ways:
- As a message bus, it is a layer of abstraction on top of the network. Instead of each node needing to establish a connection to every node it needs to connect to, it just connects to a NATS cluster and messages are routed by subject. This is great for debugging because we can "wiretap" messages on a given subject pattern and verify what's being sent. We even have a service that listens on NATS subjects and conditionally turns events into Slack messages.
- It has a built-in RAFT implementation (via JetStream), which we piggyback on when we need to create consensus among nodes.
NATS's only responsibility is to route messages in near-real-time from publishers to consumers. Messages are ephemeral and dropped immediately after delivery; if nobody is listening, the messages vanish. Messages are only queued temporarily in RAM if the consumer is busy, and they can get dropped if a consumer doesn't handle them fast enough. In short, NATS is very lightweight and fast, and designed for things that are lightweight and fast. It's like a kind of distributed socket mechanism, and works best as a communication primitive you build stuff on top of (like TCP or UDP) rather than a fully fledged system.
So it's very different from Kafka and other types of queues that are durable and database-like. Kafka is good for "fat pipes" that centralize data from producers into a log which is then consumed by massively parallel sets of consumers, and you don't constantly change this topology. NATS is good for networks of fast-changing producers and consumers that send small messages to each other, often one-on-one, although any fan-out topology works. It's great for firehose-type routing. For example, imagine you want your app to produce lots of different kinds of telemetry. Your app just sends messages to a topic "telemetry" or maybe a dotted topic like "telemetry.iostats" or "telemetry.errors". Then a client can "tap" into that topic by listening to it. If no client is listening, the firehose goes nowhere. But then a client can tap into "telemetry.errors" and get just the stream of error messages. Topics are just strings, so you can create unique topics for temporary things; an app can send a message to another app like "hey, do some work and then send the result to my temporary topic foobar726373".
NATS is particularly notable for its "just works" design. The clustering, for example, ties together brokers with no effort. Clients typically don't need any configuration at all, other than the name of a NATS server.
NATS can be used as a low-level component to build stateful stuff. NATS Jetstream is a Kafka-like solution that stores durable logs, and uses NATS as its communication protocol. Liftbridge is another one.
But yes, to be able to replay without side effects you’ll want to make sure you’re setting up the consumers correctly. That may need some custom logic, but isn’t that necessary with any message queue?
[1]: https://docs.nats.io/using-nats/developer/develop_jetstream
I'm curious why are RabbitMQ and AMQP no longer part of such comparisons (not only your comment, nobody else on this thread has mentioned them).
NATS does not have stateful storage. So when a consumer disconnects and reconnects, there is nowhere for NATS to store the messages temporarily. You can solve this by storing messages in a stateful storage first, then use NATS as a way to distribute them. You would need your own mechanism to replay messages on reconnect. This is coincidentally what Jetstream does. It uses NATS internally as a network protocol, but it's a separate thing.
Though I suspect that Synadia people might do at least one more minor iteration of Jetstream: it still seems a little more complex than it needs to be, unlike Core