zlacker

Just because something can be used to do something doesn't mean it should. Kafka is specifically designed for this purpose, it is free, and it is easy to learn and use. If "starting with Postgres and then switching out when the time comes" saves money then I can understand. Otherwise use the right tool for the right job, right from the start.

replies(5): >>Edward+C2 >>anothe+Z2 >>jastr+C3 >>static+o4 >>cp9+Xp

>>petilo+(OP)
I've been working with Kafka since 0.8, I mildly beg to differ on "easy to learn and use", just based on the fact that to use it well, you have to design your applications for its semantics, and that tuning it requires a lot of indepth understanding of its mechanics.

And I've seen a looot of bad designs and misconfigurations.

All that said, I'm a massive fan of Kafka, I'm the first to admit it's a complex tool, but it needs to be for the problem space it targets.

>>petilo+(OP)
I've been on stage at KafkaConf demo-ing my Kafka SRE chops, and I would avoid Kafka until I am sure it is necessary.

'easy to learn and use' is a downright lie.

edit: link to this same topic being discussed a few weeks ago: https://news.ycombinator.com/item?id=28903614#28904103

>>petilo+(OP)
This is advice that seems reasonable but is actually pretty harmful.

Take a startup with a few users. The senior engineer decides they need pub/sub to ship a new feature. With Kafka, the team goes to learn about Kafka best practices, choose client libraries, and learn the Kafka quirks. They also need to spin up Kafka instances. They ship it in a month.

With postgres, they’ve got an MVP in a day, and shipped within a week.

replies(3): >>daenz+o5 >>petilo+g8 >>threes+yf

>>petilo+(OP)
Kafka is not a queue. Kafka's parallelism is limited by the number of partitions you allocate, and you have to be sure to avoid head of line blocking.

Not the case with a queue.

replies(3): >>jpgvm+V7 >>Spivak+Yd >>fnord7+uv

>>jastr+C3
I can set up an application to use AWS SQS or GCP PubSub in a day and it will scale without a second thought. I don't think it's productive to compare the worst case of scenario A and the best case of scenario B.

>>static+o4
Exactly. If you do want something very scalable that fixes these problems but shares a lot of architectural similarity with Kafka then you should check out Apache Pulsar.

>>jastr+C3
> With postgres, they’ve got an MVP in a day, and shipped within a week.

And the next week they realize they want reader processes to block until there is work to do. Oops that's not supported. Now you have to code that feature yourself... and soon you're reinventing Kafka.

replies(2): >>guywho+6m >>mlyle+Ru

>>static+o4
This needs to be sung from the rooftops every time Kafka is mentioned. It's an amazing tool but it is the wrong wrong wrong tool if you need a queue. It will bite you in the ass and you'll be left with someone breathing down your neck wondering why jobs are processing so slowly and why you can't just spin up more workers.

>>jastr+C3
How does any of this equally not apply to PostgreSQL ?

Is this some magical database where you don't need to worry about access patterns, best practices or how it is deployed.

replies(2): >>KptMar+0j >>pritam+3e3

>>threes+yf
Yes, it's that magical database, up to certain scale.

>>petilo+g8
That's where LISTEN comes in. It's very simple to write this loop perfectly correct.

>>petilo+(OP)
> Kafka is specifically designed for this purpose, it is free, and it is easy to learn and use

I think Kafka is great, but it is absolutely not “easy to learn and use”.

>>petilo+g8
The very source we're talking about describes how to block until there is work to do -- listener.Listen("ci_jobs_status_channel")

>>static+o4
what is "head of line" blocking?

replies(1): >>static+r92

>>fnord7+uv
A single partition is intended to be processed, more or less, in by a single worker. If one of those messages, for whatever reason, ends up being really expensive, or flaky, you can't move on until you've handled it.

That's head of line blocking.

>>threes+yf
> How does any of this equally not apply to PostgreSQL ?

1. Postgres is easier to setup and run (than Kafka) 2. Most shops already have Postgres running (TFA is targeted to these shops) 3. Postgres is easier to adapt to changing access patterns (than Kafka).

----

> Is this some magical ...

Why must your adversary (Postgres) meet some mythical standard when your fighter (Kafka) doesn't meet even basic standards.