zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. petepe+V[view] [source] 2021-06-12 07:08:51
>>hyzyla+(OP)
I've used PostgreSQL in the first two scenarios and would love an opportunity to in the third. It's worked really well and hasn't caused any problems under decent loads.

The one feature of Redis I'd love to have supported in PostgreSQL is to be able to set a TTL on a record. On a recent project where we could only save personal data for so long, it was a must have feature so we had to use Redis for that purpose instead.

◧◩
2. sologo+m1[view] [source] 2021-06-12 07:13:40
>>petepe+V
Wouldn’t a simple scheduled batch job be enough to go through periodically and drop records older than N days?

This would also give you audit logs, etc. As well as flexibility to adjust business logic without updating the TTL on all records.

◧◩◪
3. tbarbu+K5[view] [source] 2021-06-12 08:03:31
>>sologo+m1
Deleting large amount of data on Postgresql is expensive. First you need an index on a column to select the expired data, then you actually need to delete the rows which creates a lot of garbage, is heavy and slow.

Creating and deleting a lot of data on PG is a pain because of MVCC and vacuum. One useful trick is to partition data into tables and to truncate/drop entire tables, drop/truncate is instant and reclaims space immediately.

◧◩◪◨
4. plank_+T7[view] [source] 2021-06-12 08:27:42
>>tbarbu+K5
You can do the equivalent by adding a Timestamp column with index on your table and add “where now - timestamp <= TTL”. (Or some computational easier way that doesn’t require math on the query)

TTL on systems like Cassandra is pretty ugly and deleting data is hard as you scale no matter how you do it. I don’t think Postgres would be able to implement a TTL that is magically better than the rest.

◧◩◪◨⬒
5. neziru+G9[view] [source] 2021-06-12 08:46:11
>>plank_+T7
Deleting whole partitions is generally useful strategy. It's like the difference between single inserts and batch inserts (often huge performance difference, and much lower IO)

Since you mentioned Cassandra and TTL, I'll mention ClickHouse, very nice TTL options, splitting into smaller partitions and using "ttl_only_drop_parts=1" has prove itself in the production with big data ingestion rates.

Last, but not the least, I almost always prefer Postgres for data storage needs, one can trust it to be safe and fast enough. Only some specific situations warrant other solutions, but it's a long way until that point (if ever), and better not optimize too early.

[go to top]