zlacker

[parent] [thread] 16 comments
1. pilif+(OP)[view] [source] 2021-06-12 08:43:02
For pub/sub, I would recommend against using PostgreSQL if you're doing it at any kind of scale because LISTEN ties up one connection completely and Postgres connections are very expensive compared to a redis connection.
replies(3): >>dvfjsd+i >>gunnar+I >>postgr+pZ
2. dvfjsd+i[view] [source] 2021-06-12 08:46:27
>>pilif+(OP)
I was looking for your comment here, thank you. I hope they solve this problem somehow in a future release.
3. gunnar+I[view] [source] 2021-06-12 08:52:03
>>pilif+(OP)
The other issue with LISTEN/NOTIFY is that a client will miss any notifications sent while it is not running (e.g. due to crash, update, etc.). An alternative would be logical replication and change data capture (CDC), as provided for Postgres by Debezium for instance. That way, any consumers won't miss events during downtimes, as they'll continue to read from the replication slot from the last offset they had processed.

Disclaimer: I work on Debezium

replies(5): >>superm+a2 >>hardwa+d3 >>jbvers+o5 >>BiteCo+39 >>Michae+4N
◧◩
4. superm+a2[view] [source] [discussion] 2021-06-12 09:10:40
>>gunnar+I
isnt this also the case for redis pubsub?
replies(2): >>philjo+s3 >>Nick-C+vm
◧◩
5. hardwa+d3[view] [source] [discussion] 2021-06-12 09:24:02
>>gunnar+I
Could you explain a little more about this (Debezium is awesome btw)? So after creating the proper replication slots, you set up change data capture via Debezium, then you listen to the CDC stream from the original DB you are connected to, correct?

This sounds like it might be vulnerable to dead consumers causing WAL to pile up without the right settings/periodic checks.

replies(1): >>gunnar+pO
◧◩◪
6. philjo+s3[view] [source] [discussion] 2021-06-12 09:26:25
>>superm+a2
I'd need to check, but I think so.

Which is probably why if you don't want loss if consumers go down a proper queue system (RabbitMQ, ActiveMQ, Amazon SQS, Kafka if you don't care about ordering between partitions) is the way I'd go.

◧◩
7. jbvers+o5[view] [source] [discussion] 2021-06-12 09:52:25
>>gunnar+I
This is by design and very common with pub/sub.
◧◩
8. BiteCo+39[view] [source] [discussion] 2021-06-12 10:29:28
>>gunnar+I
Then use streams.
◧◩◪
9. Nick-C+vm[view] [source] [discussion] 2021-06-12 13:08:40
>>superm+a2
For pub/sub yes that's correct. For full info though: Redis later added streams (in 5.x) for the don't-wan't-to-miss case: https://redis.io/topics/streams-intro
◧◩
10. Michae+4N[view] [source] [discussion] 2021-06-12 17:13:14
>>gunnar+I
then what's the point of this feature as part of the db, if it doesn't do persistance?
◧◩◪
11. gunnar+pO[view] [source] [discussion] 2021-06-12 17:24:31
>>hardwa+d3
> then you listen to the CDC stream from the original DB you are connected to, correct?

Yes, exactly.

> This sounds like it might be vulnerable to dead consumers causing WAL to pile up without the right settings/periodic checks.

There are some subtleties around this indeed. You should monitor backlog of replication slots, so to identify inactive consumers as you say. Also, there are some corner cases you need to take care of: when listening to changes from a low-traffic database on the same PG host that also has a high-traffic database, the replication slot may not be acknowledged often enough. Debezium mitigates this by optionally writing changes to a dummy heartbeat table in the low-traffic database, so to advance the replication slot.

In Postgres 13 there's also a new option "max_slot_wal_keep_size" which limits WAL size retained by a replication slot. This prevents unbounded WAL growth, at the risk of consumers to miss events if they are down for too long.

All in all, proper monitoring and alerting is key.

12. postgr+pZ[view] [source] 2021-06-12 18:51:47
>>pilif+(OP)
> LISTEN ties up one connection completely

I've seen this twice in this thread, but I don't know what that means. Can you explain a bit?

replies(1): >>sa46+e91
◧◩
13. sa46+e91[view] [source] [discussion] 2021-06-12 20:33:52
>>postgr+pZ
When a client connects to Postgres, Postgres creates a new process just for that client. Separate processes are great for isolation but it means Postgres connections are a bit expensive (this is an active area of improvement). Postgres really starts to struggle once you a have a few thousand connections, even if those connections aren’t doing anything.

The common workaround is to use a connection pooler like PGBouncer so that clients reuse connections. This approach doesn’t work for LISTEN because typically a client will listen for its entire lifecycle so you can’t share connections in a pool.

replies(2): >>taffer+Nf1 >>postgr+Q32
◧◩◪
14. taffer+Nf1[view] [source] [discussion] 2021-06-12 21:41:51
>>sa46+e91
You need one connection per worker thread, and realistically you would only have one worker per cpu core. So how many LISTENing connections do you really need?
replies(1): >>sa46+nm1
◧◩◪◨
15. sa46+nm1[view] [source] [discussion] 2021-06-12 22:49:10
>>taffer+Nf1
You generally have more than one database connection per thread. As an example, consider Node.js which acts like a single threaded process. You’d probably want to be able to handle more than 1 database query concurrently since network latency tends to dominate OLTP requests.

How you setup LISTEN and NOTIFY is app dependent. In a multi-tenant database, you could have 1 NOTIFY channel per tenant.

As you scale, you probably do something that listens in a smarter way, maybe 1 Go channel per client with a single LISTEN instead of 1 database connection per client. The downside is that now the app code is responsible for tenant isolation instead of the database.

replies(1): >>taffer+wj2
◧◩◪
16. postgr+Q32[view] [source] [discussion] 2021-06-13 08:09:34
>>sa46+e91
> This approach doesn’t work for LISTEN because typically a client will listen for its entire lifecycle so you can’t share connections in a pool.

But you only need one connection for LISTEN per database, total. So I'm confused why this is made out to be a big issue.

◧◩◪◨⬒
17. taffer+wj2[view] [source] [discussion] 2021-06-13 11:43:27
>>sa46+nm1
> [...] since network latency tends to dominate OLTP requests.

You are absolutely right about that. However, in this case we are talking about background workers. I would argue that background workers, such as an image resizing worker, are typically CPU-bound and do not perform high-volume OLTP queries. Therefore, a background worker does not require multiple database connections per thread as a web server would.

[go to top]