zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. petepe+V[view] [source] 2021-06-12 07:08:51
>>hyzyla+(OP)
I've used PostgreSQL in the first two scenarios and would love an opportunity to in the third. It's worked really well and hasn't caused any problems under decent loads.

The one feature of Redis I'd love to have supported in PostgreSQL is to be able to set a TTL on a record. On a recent project where we could only save personal data for so long, it was a must have feature so we had to use Redis for that purpose instead.

◧◩
2. sologo+m1[view] [source] 2021-06-12 07:13:40
>>petepe+V
Wouldn’t a simple scheduled batch job be enough to go through periodically and drop records older than N days?

This would also give you audit logs, etc. As well as flexibility to adjust business logic without updating the TTL on all records.

◧◩◪
3. tbarbu+K5[view] [source] 2021-06-12 08:03:31
>>sologo+m1
Deleting large amount of data on Postgresql is expensive. First you need an index on a column to select the expired data, then you actually need to delete the rows which creates a lot of garbage, is heavy and slow.

Creating and deleting a lot of data on PG is a pain because of MVCC and vacuum. One useful trick is to partition data into tables and to truncate/drop entire tables, drop/truncate is instant and reclaims space immediately.

◧◩◪◨
4. squigg+Ab[view] [source] 2021-06-12 09:10:42
>>tbarbu+K5
> Creating and deleting a lot of data on PG is a pain because of MVCC and vacuum. One useful trick is to partition data into tables and to truncate/drop entire tables, drop/truncate is instant and reclaims space immediately.

In this case, the requirement is that user data must only be kept for a certain time and not longer.

If that time is a property of the record create and we're allowed to keep the data with an error of a day, I guess it's easy: We partition the table on a daily basis, and delete any partition that is older than 28 days old. Sometimes, a record will be closer to 29 days old when it's deleted, but we accepted that, and it's easy enough to write queries so that it's unavailable to the application if it's more than exactly 28*24*60*60 seconds old if that's our constraint.

If the requirement is to keep it based on the last of a certain kind of use, we'd need to move it from one partition to another if we need to keep it. For instance, if we can keep data for 28 days after the user last longs in, we can't just drop the partition - unless we've moved the user each day they log in.

If we have that kind of a constraint, where data lifespan is based on properties that change over the lifetime of the data, is partitioning + drop actually a useful approach? The drop will still be instant, I guess, but it's the movement of data over its lifetime that concerns me here.

[go to top]