NATS vs MQTT vs Kafka vs Redis Queue vs Amazon SQS - how do they all stack up?
Depending on the throughput you need, the number of clients pulling/putting into the queue, and other things you can rule out certain options. For example, Redis queue will be one of the fastest choices - but is very limited in capacity (memory size). SQS is basically unlimited capacity, but can often have 150ms or more of time elapse between when you put the item in and when it's available.
For anyone looking to support multiple message patterns on one message bus, this is what you want to check out.
In AWS terms, it’s like SNS/SQS/Kinesis all rolled into one bus & very intuitive to work with.
As far as ideal use cases, we use NATS for https://plane.dev in two ways:
- As a message bus, it is a layer of abstraction on top of the network. Instead of each node needing to establish a connection to every node it needs to connect to, it just connects to a NATS cluster and messages are routed by subject. This is great for debugging because we can "wiretap" messages on a given subject pattern and verify what's being sent. We even have a service that listens on NATS subjects and conditionally turns events into Slack messages.
- It has a built-in RAFT implementation (via JetStream), which we piggyback on when we need to create consensus among nodes.
I was looking to stream sensor data from a mobile device using MQTT, but the Eclipse Paho Java client (1.2.5) hasn't seen a release in 3 years and I found it to be pretty buggy. There are lots of open issues on the GitHub page.
NATS's only responsibility is to route messages in near-real-time from publishers to consumers. Messages are ephemeral and dropped immediately after delivery; if nobody is listening, the messages vanish. Messages are only queued temporarily in RAM if the consumer is busy, and they can get dropped if a consumer doesn't handle them fast enough. In short, NATS is very lightweight and fast, and designed for things that are lightweight and fast. It's like a kind of distributed socket mechanism, and works best as a communication primitive you build stuff on top of (like TCP or UDP) rather than a fully fledged system.
So it's very different from Kafka and other types of queues that are durable and database-like. Kafka is good for "fat pipes" that centralize data from producers into a log which is then consumed by massively parallel sets of consumers, and you don't constantly change this topology. NATS is good for networks of fast-changing producers and consumers that send small messages to each other, often one-on-one, although any fan-out topology works. It's great for firehose-type routing. For example, imagine you want your app to produce lots of different kinds of telemetry. Your app just sends messages to a topic "telemetry" or maybe a dotted topic like "telemetry.iostats" or "telemetry.errors". Then a client can "tap" into that topic by listening to it. If no client is listening, the firehose goes nowhere. But then a client can tap into "telemetry.errors" and get just the stream of error messages. Topics are just strings, so you can create unique topics for temporary things; an app can send a message to another app like "hey, do some work and then send the result to my temporary topic foobar726373".
NATS is particularly notable for its "just works" design. The clustering, for example, ties together brokers with no effort. Clients typically don't need any configuration at all, other than the name of a NATS server.
NATS can be used as a low-level component to build stateful stuff. NATS Jetstream is a Kafka-like solution that stores durable logs, and uses NATS as its communication protocol. Liftbridge is another one.
It's just a dumb communications buss for us. It's replaced most of our ETL needs with Realtime events. If an order is placed online an event is raised. Our accounting system can consume that event to create the sales order. Then the production system consume the same event to add the job to the next production batch. Each production step produces events that can be used to update other systems, including Realtime updates on the ecommerce system.
We use Jetstream so consumers don't need to be awake when producers create events.
This system spans the cloud and 3 physical locations. But, to the consumers and producers it's one buss that they only have to authenticate with once.
At a guess they are talking about applications being built from the ground up to dynamically allocate resources using cloud providers APIs directly rather than relying on an assumption fixed resources are already provisioned and the application runs within them.
I wonder if I'm right ...
Compared to HTTP/REST I miss the debugabilty of it. Since all messages are sent over websocket and received, in binary form, since there is no support from Chrome or Firefox, one has to tediously manually extract the json payload and try to make sense of it.
Jetstream is still quite the mystery. It just doesn't want to work the way I want it to, especially on the JS client side, blocking script execution and timing out.
Then there's auth, after a few shutdowns and reboots I got locked out of my local installation. Oidc subject to nats user mapping doesn't exist and has to be done manually.
So TL;DR the core functionality is great. Everything else seems to be WIP.
Ceph is a storage backend. Most of them tend to be built for or work well with kubernetes.
- very fast startup
- low memory
- can be easily distributed or is stateless
I agree that the auth system is cumbersome. I wanted to use it on the edge for IoT devices where the device only has the same permissions as the user who the device belongs to, but not very easy. Their auth isn’t very customizable.
NATS certainly has its quirks, but I can't recommend it highly enough if you need any sort of pub/sub or stream processing. It even has built-in key-value and object storage for when you need to store larger messages or content. I definitely prefer Jetstream to Kafka in pretty much every use case I can think of. At my current employer (https://cosmonic.com) we use NATS not only for wasmCloud, but we also stream log data and metrics and it keeps up with everything we throw at it with a very low footprint. Auth is kind of counterintuitive until you've spent some time with it, but NATS provides you with a ton of flexibility (docs here: https://docs.nats.io/running-a-nats-service/configuration/se...).
https://natsbyexample.com/ is a great resource, and can do a better job than I can in illustrating the various ways NATS can be used along with different deployment topologies.
Contradictory perhaps? So it's not what it means but it is what it means?
In order to achieve what you've said in vague terms you definitely do need to try to be stateless, have fast startup and reduce memory. Clear examples are things like lambda and fargate.
How can you be elastic if you have an application server that takes 10 minutes to start?
... the protocol is text-based like HTTP with CR LF for field both for the client, https://docs.nats.io/reference/reference-protocols/nats-prot..., and cluster protocols, https://docs.nats.io/reference/reference-protocols/nats-serv... -- which means encoding overhead if your payloads are binary. So depending on your definition of performance, ymmv.
I really do not see how implementing an API across multiple languages is easier by making a new linefeed-based protocol, https://github.com/nats-io/nats-server/blob/0421c65c888bf381..., than just using code-generated JSON or gRPC (Protobuf or Flatbuffers). One could then write subscriptions/clustering algorithms in a protocol-neutral library.
> Json would be less efficient; gRPC adds tons of complexity and overhead.
Indeed. There aren’t many suitable specs around, and this protocol, albeit custom, is very easy to implement. Which is proven by the fact that there are well maintained Nats clients in many different languages.
What's great is that NATS is written in go and we can easily embed it for testing and dev purposes. Furthermore, Synadia makes it super easy to run NATS across multiple regions.
But yes, to be able to replay without side effects you’ll want to make sure you’re setting up the consumers correctly. That may need some custom logic, but isn’t that necessary with any message queue?
[1]: https://docs.nats.io/using-nats/developer/develop_jetstream
If that indeed is what cloud native means, it sounds interesting. But the problem is that all these APIs and especially "managed" services are super proprietary and you'll vendor lock yourself pretty hard. But I suppose that ship has sailed a long time ago.
Authentication and integration with auth and secret providers are another distinguishing feature. I personally find "cloud native" software to be a pain to use locally because they usually come in the form of a docker-compose and kubernetes setup, and those absolutely gobble up ram and disk space.
I'm curious why are RabbitMQ and AMQP no longer part of such comparisons (not only your comment, nobody else on this thread has mentioned them).
Or to put it more clear, applications written in modern language runtimes, packaged in containers that can be run on top of whatever orchestration is available, and using provider APIs and resources.
https://nightlies.apache.org/flink/flink-docs-master/docs/de...
We initially built our code around NATSStreaming, they then went ahead and deprecated that.
But since we saw so many big companies using NATS, they thought it might be a good idea to stick with it and we did a year long migration to their shiny new NATSJetstream Push based approach, but from what I see now in the conversations they are going to deprecate that too in favour of Pull based approach which is architecturally very different, now we will have to somehow convince management for another rewrite. I am not sure if we should even rewrite or just move to another product at this point.
Dear NATS, please stop throwing away and rewriting protocols and products. Or make it such that the end client libraries would handle that upgrade automatically with a library upgrade.
We should have just stuck with the more traditional Kafka or RabbitMQ.
What I have also learnt is that when companies put big brand logos on their websites, it just means some random Dev from that company is using it for their side project or experimental mini project.
This is always so hard to figure out for B2B libraries.
Jetstream is also much easier to operate than Kafka IMO. Just a simple single binary with an easy to understand config.
Besides that I'm also working on a UI solution, that will help to get better overview of your cluster: https://qaze.app/
And even NATS Streaming still works: it is deprecated, not removed.
Unlike databases, a good thing about messaging and streaming solutions is that you don't have to pick one: you can make them talk to each other as long as there are bridges. This also applies to different approaches to messaging/streaming provided by a single platform.
Same as "web scale" and "big data".
NATS does not have stateful storage. So when a consumer disconnects and reconnects, there is nowhere for NATS to store the messages temporarily. You can solve this by storing messages in a stateful storage first, then use NATS as a way to distribute them. You would need your own mechanism to replay messages on reconnect. This is coincidentally what Jetstream does. It uses NATS internally as a network protocol, but it's a separate thing.
But, you know, it's still a bit more complicated than opening the browser's debug console and inspecting request and response in the networking tab.
Yeah you have operators, which are essentially the auth admins or orgs, however you'd like to look at it, then there are accounts which are an alias for "project name" and then you have users, which are "client name"s. And that's fine for their infrastructure only. The problem is, in the real world you have an external identity service (OIDC, IAM,...) and the JWT this service creates includes a subject, but the NATS auth system has no support for external bridges. Also you have to decide, would you like to use operator/account/user or single JWT, or passphrase or manually managed user/password accounts? In a professional world, you'd have various customer applications each divided by their operators, accounts and users. And then fine grained user capabilities.. headache time.
So what you have to do is distribute a default user certificate with a client (client meaning actual client in the client/server sense), then do the signin/sso process and have some middleware checking this token from the OIDC auth process and check is a user with this subject exists, if not, create a new user and cert and send that cert back to the client which will then create a new connection with the new certificate.
Very complicated.
The initial default client certificate is required for "logging in" via NATS. I guess it could be a standalone program, the oidc to nats user mapper/manager.
So compared to HTTP/REST, more work involved.
Though I suspect that Synadia people might do at least one more minor iteration of Jetstream: it still seems a little more complex than it needs to be, unlike Core
Whether that’s just marketing BS or real depends on the project. Wether it fits your particular kind of cloud environment is also a different story.
In the specific case of NATS I love how I can start with a single server on localhost, then maybe upgrade to a single fly.io instance, then later move to a larger AWS, instance, then later add some fault-tolerance by turning a single server into a cluster, then later have multiple clusters in various AZs around the world, hosted on different cloud providers.
NATS makes all of these changes (and a lot more) a breeze. Any component or application using pub/sub, KeyVal, durable streams, request/response will just keep working without a single any changes.
Disclaimer: I love NATS, it’s the most promising piece of infrastructure technology I have seen in a long time.
IMO it is a real waste of developer time to code up a new transport protocol without determining that the existing ones don't work or don't perform as well as needed. Multiply that by all the programming languages that need to be supported... when instead the client APIs could have been mostly code generated.
Although may only apply to the "client-side" API, what is going in the message payload? JSON probably, or some other serialization format. I don't think a lot of developers are hand writing parsers for their own payloads. The overhead of the JSON or Protobuf parser is already in there.
Derek seems like the kind of person who might know a thing or two about messaging systems.
I really like what NATS has become, and I do appreciate the simplicity of the protocol.
Pull does have advantages over push (e.g. one-to-one flow control since the transfer of the messages is initiated by the client (pull requests)), and they are basically functionally equivalent (only thing push can do that pull can not is send a copy of all the message to all the subscribers, should you ever need it). They both exists because historically push came first and then pull later).
As a developper using NATS JetStream you should really not have to worry about push or pull, you should just care whether you want to consume the messages via call back or via an iterator or via fetching batches, after that whether pull or push is being used underneath the covers is irrelevant to you.
And this is exactly how it is in the new JetStream API (https://github.com/nats-io/nats.go/tree/main/jetstream#readm...) you don't have to worry about push/pull anymore and you can consume in any of the 3 ways described above (callback, iterator, fetch batch) it's all a lot simpler and easier to use.