zlacker

Do you really need Redis? How to get away with just PostgreSQL

submitted by hyzyla+(OP) on 2021-06-12 06:57:14 | 841 points 454 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
8. _ugfj+z2[view] [source] 2021-06-12 07:29:54
>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

19. mossel+14[view] [source] 2021-06-12 07:45:55
>>hyzyla+(OP)
For rails I’ve used que in the past, which is a job queue adapter for Postgres. It supports activejob.

https://github.com/que-rb/que

22. skunkw+E4[view] [source] 2021-06-12 07:52:51
>>hyzyla+(OP)
Hopefully one day projects like que[1][2] will become stable and battle-tested enough to use in a production environment. Until then I'll be using something like sidekiq (if you're going for a client-side job queue, eg: the clients don't really know about each other and only have rate-limiting, not true throttling).

With Postgres you also need to worry about high churn, especially since you are creating/locking/deleting rows constantly. This can be alleviated through a variety of means, of which personally I would use per-day table partitioning and truncate older partitions on a cron, not to mention the sharp increase in database connections to the host now required.

Ignoring the literal elephant in the room of synced writes to the store. Redis can be used quite effectively in a blocking manner with RPOPLPUSH/LMOVE(6.2+) for a reliable queue, allowing an item to not be lost because atomically the pop and push from two different lists are done together.

[1] https://github.com/que-rb/que [2] https://gist.github.com/chanks/7585810

30. avinas+L5[view] [source] 2021-06-12 08:03:47
>>hyzyla+(OP)
Most of the times Redis's distributed lock works fine, however one should know that its not fail proof and you might run into really weird bugs

references:

0 - https://aphyr.com/posts/283-jepsen-redis

1 - https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

◧◩
63. Twisol+Xa[view] [source] [discussion] 2021-06-12 09:02:38
>>_ugfj+z2
For other readers, the `UPDATE` step is an exact anlogue of the "compare-and-set" atomic instruction [0]. It's really cool to see how you've realized it in SQL!

As a capability, compare-and-swap has an infinite consensus number [1], meaning it's sufficient to implement wait-free consensus algorithms with an arbitrary number of participants. That makes it a perfect fit for managing a scalable pool of workers that need to coordinate on consuming from a queue!

[0] https://en.wikipedia.org/wiki/Compare-and-swap

[1] https://en.wikipedia.org/wiki/Consensus_(computer_science)#C...

◧◩
67. shaico+Eb[view] [source] [discussion] 2021-06-12 09:12:05
>>mossel+14
Another alternative is GoodJob:

https://island94.org/2020/07/introducing-goodjob-1-0

https://github.com/bensheldon/good_job

71. BiteCo+dc[view] [source] 2021-06-12 09:18:24
>>hyzyla+(OP)
Oh but redis is much more than that.

It's so simple any little script can interact with it in seconds, instead of having to craft complex SQL or import your wrapper. You can call redis from the weirdest places, like from the inside of an nginx config file. You needn't even a compiled driver if that matters.

It's easier to get it to perform. It's possible to get great perfs with PostgreSQL, but you just got them for free with redis. Very hard to screw up. Read, write ? N+1 ? Batching ? Who cares, it will be fast no matter the concurrency or the load.

Sure you can expire with Postgres, but having done so in practice, it's way harder than it looks to get right. With redis, you don't have to care. Set a cache, set a lock with a timeout, store data your are not sure you need. It will all disapear.

Redis is not just key/value. You have sets, so ensuring unicity is a piece of cake, no constraints to define, then insertion to check against. And of course, you have sorted sets, which you are kept ordered at insertion by any number you pair the value with, such as a timestamp/score/index, and truncate by range for extra magic.

And then you have bonuses like hyperloglog which you need an extension for in Posgres. Super handy for stats.

Finally, you have streams, which for most apps will fill the gap for a timeserie database or a way to store your app logs. All that with a 2ms latency at 10k/s requests. None of my projects ever needed more than 1K/s though, even one with 1M users/month.

You have all of that with a dead simple install and basically no maintenance.

In fact, redis by itself consume almost no resource, it's not even worth it to not have it in your toolset. I just install it by default on all my projects: I never regretted it, there is always something it can do for you. It not now, just wait a bit, it's costing you nothing, and something will come up.

So no, let use Postgres for what it's great at, which is being a robust all purpose database. Redis is a fantastic complement to it, not a competitor, just use both.

Unless you are google size, there are little chances you will reach a stage where you need to migrate from any of them.

It's part of those tech that are just too good to be true, like SQLite or Python.

My only regret is that it doesn't exist on windows.

P.S: if you need caching and can't afford redis on a small python script, use Disk Cache, it's awesome: http://www.grantjenks.com/docs/diskcache/index.html

◧◩
73. hardwa+oc[view] [source] [discussion] 2021-06-12 09:20:49
>>_ugfj+z2
Don't forget SKIP LOCKED[0]!

[0]: https://blog.2ndquadrant.com/what-is-select-skip-locked-for-...

◧◩◪◨
79. hardwa+gd[view] [source] [discussion] 2021-06-12 09:31:20
>>tehbea+y4
Postgres is not as automatic as other tools but is mostly an artifact of it being around so long, and focus being on other things. Few projects have been around and stayed as relevant as postgres.

Most of the time, you really don't need to scale postgres more than vertically (outside of the usual read replicas), and if you have tons of reads (that aren't hitting cache, I guess), then you can scale reads relatively easily. The problem is that the guarantees that postgres gives you around your data are research-level hard -- you either quorum or you 2pc.

Once you start looking into solutions that scale easily, if they don't ding you on performance, things get murky really quick and all of a sudden you hear a lot of "read-your-writes" or "eventual consistency" -- they're weakening the problem so it can be solved easily.

All that said -- Citus and PostgresXL do exist. They're not perfect by any means, but you also have solutions that scale at the table-level like TimescaleDB and others. You can literally use Postgres for something it was never designed for and still be in a manageable situation -- try that with other tools.

All that said, KeyDB[0] looks pretty awesome. Multithreaded, easy clustering, and flash-as-memory in a pinch, I'm way more excited to roll that out than I am Redis these days.

[0]: https://github.com/EQ-Alpha/KeyDB

◧◩◪
80. chx+hd[view] [source] [discussion] 2021-06-12 09:31:43
>>Twisol+Xa
Yes, the UPDATE command is the exact equivalent of LOCK CMPXCHG (the SELECT can be seen as computing the memory address). So the discussion about that in the comments of https://stackoverflow.com/a/59022356/308851 totally applies: if two queue runner threads pick the same item exactly one will succeed so it can't happen both tries and tries the same item. So there's no busy wait (reminder: a busy wait is where it does nothing just tests a condition), it just goes over every candidate until one succeeds.
84. hardwa+je[view] [source] 2021-06-12 09:45:19
>>hyzyla+(OP)
A blog post series I've been meaning to write for over 3 years now:

* Every database a Postgres 1: Key/Value store

* Every database a Postgres 2: Document stores

* Every database a Postgres 3: Logs (Kafka-esque)

* Every database a Postgres 4: Timeseries

* Every database a Postgres 5: Full Text Search

* Every database a Postgres 6: Message Queues

Low key, you could make almost every single type of database a modern startup needs out of Postgres, and get the benefits (and drawbacks) of Postgres everywhere.

Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

If anyone from 2ndQuadrant/Citus/EDB see this, please do a series like this, make the solutions open source, and I bet we'd get some pretty decent performance out of Postgres compared to the purpose built solutions (remember, TimescaleDB did amazing compared to InfluxDB, a purpose built tool, not too long ago).

New features like custom table access methods and stuff also shift the capabilities of Postgres a ton. I'm fairly certain I could write a table access method that "just" allocated some memory and gave it to a redis subprocess (or even a compiled-in version) to use.

[EDIT] - It's not clear but the listing is in emacs org mode, those bullet points are expandable and I have tons of notes in each one of these (ex. time series has lots of activity in postgres -- TimescaleDB, native partitioning, Citus, etc). Unfortunately the first bullet point is 43 (!) bullet points down. If someone wants to fund my yak shaving reach out, otherwise someone signal boost this to 2Q/Citus/EDB so professionals can take a stab at it.

[EDIT2] - I forgot some, Postgres actually has:

- Graph support, w/ AgensGraph now known as AGE[0]

- OLAP workloads with Citus Columnar[1] (and zedstore[2]).

[0]: https://age.apache.org

[1]: https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...

[2]: https://github.com/greenplum-db/postgres/tree/zedstore

88. fasteo+Ue[view] [source] 2021-06-12 09:53:04
>>hyzyla+(OP)
For use case 1 (job queue) I can only recommend beanstalkd[1]. Simple tcp plain text protocol. Install and forget. Plus you get some really useful features like deferred jobs, burying, kicking, job priorities, etc.

We have literally processed tens of billions of jobs without a single failure.

Old and simple technology that just works

[1] https://beanstalkd.github.io/

◧◩
89. rkwz+gf[view] [source] [discussion] 2021-06-12 09:56:53
>>hardwa+je
> Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

It's also useful when you want to quickly build a "good enough" version of a feature like search so you can get it in front of your users fast and iterate on their feedback. Most of the time, they'd be quite happy with the results and you don't have to spend time on something like managing Elasticsearch.

I wrote a post on how you can use postgres to add search capabilities with support for queries like

jaguar speed -car

ipad OR iphone

"chocolate chip" recipe

http://www.sheshbabu.com/posts/minimal-viable-search-using-p...

94. Rygu+hg[view] [source] 2021-06-12 10:08:01
>>hyzyla+(OP)
Don't forget to use Partial Indexes on jobs/tasks tables with queries like WHERE status = 'pending'

More: https://use-the-index-luke.com/sql/where-clause/partial-and-...

◧◩◪
104. hardwa+zh[view] [source] [discussion] 2021-06-12 10:19:38
>>des1nd+Ag
Just repeating what others have said:

- Postgres is probably already running (it's pretty good for OLTP workloads)

- Operational ease and robustness

- Cloud support everywhere for Postgres

- People know how to backup and restore postgres

- Sometimes Postgres will beat or wholly subsume your specialized system and be a good choice

- Postgres has ACID compliance and a very good production-ready grasp on the research level problems involved in transactions. I've never met an etcd/zookeeper cluster I didn't wish was simply a postgres table. Image being able to actually change your cache and your data at the same time and ensure that both changes happen or none of them happen (this is a bit vaporware-y, because locks and access pattern discrepancies and stuff but bear with me). You're much more unlikely to see Postgres fail a Jepsen test[0]

[0]: https://jepsen.io

◧◩
115. siscia+Oi[view] [source] [discussion] 2021-06-12 10:34:05
>>chmod7+K
It was the reason with https://zeesql.com formerly know as https://RediSQL.com was written.

My clients seems rather happy with the project.

◧◩◪◨⬒⬓
177. hardwa+kt[view] [source] [discussion] 2021-06-12 12:41:06
>>victor+3r
It really looks absolutely amazing, I feel guilty because I want to run a service on it, there's almost no downside to running it everywhere you'd normally run Redis.

Also in the cool-redis-stuff category:

https://github.com/twitter/pelikan

Doesn't have the feature set that KeyDB has but both of these pieces of software feel like they could the basis of a cloud redis product that would be really efficient and fast. I've got some plans to do just that.

◧◩◪◨
185. Nick-C+Uv[view] [source] [discussion] 2021-06-12 13:08:40
>>superm+zb
For pub/sub yes that's correct. For full info though: Redis later added streams (in 5.x) for the don't-wan't-to-miss case: https://redis.io/topics/streams-intro
◧◩◪◨⬒⬓
186. otoole+cw[view] [source] [discussion] 2021-06-12 13:10:30
>>mirekr+Rs
rqlite author here. Happy to answer any questions about it.

https://github.com/rqlite/rqlite

191. truth_+Rx[view] [source] 2021-06-12 13:29:17
>>hyzyla+(OP)
While you are at it, don't forget to use UNLOGGED tables. UNLOGGED == In Memory.

But if you must use disk based table for Job queueing, set fillfactor = 50. This takes care of heavy updates.

Indexes are helpful but costs memory and CPU, so always make sure you partition the table based on job_type for performant pending job query.

I wouldn't recommend using LISTEN/NOTIFY unless you are okay with "at most once" semantics. I have used disk table based approach for PUB/SUB to replace Kafka. More fine tuned approach will also allow (job_type, consumer_group, publisher) as a partition key.

Ref - https://www.postgresql.org/docs/current/sql-createtable.html

◧◩◪◨⬒⬓⬔
203. mirekr+0B[view] [source] [discussion] 2021-06-12 13:57:32
>>otoole+cw
Are you planning on adding websockets or something similar in the near future to support things like ie. data change notifications [0]?

[0] https://www.sqlite.org/c3ref/update_hook.html

◧◩◪◨⬒⬓⬔⧯
209. hughrr+yB[view] [source] [discussion] 2021-06-12 14:01:12
>>deskam+0u
Um, I'm not suggesting using Redis. In actual fact I said elsewhere I wouldn't use Redis for queues.

As for triggers, reporting, audit, I'm laughing now because you miss the point. That's another bunch of IO in your monolithic black box, and you're still building it.

Start here: https://www.rabbitmq.com/prometheus.html

229. simonw+HE[view] [source] 2021-06-12 14:35:14
>>hyzyla+(OP)
The best reason I know of to use a relational database as a queue is that it lets you trigger queue operations from within a transaction - so a queue item is guaranteed to be created if the transaction succeeds, and guaranteed not to be created if the transaction fails.

Brandur wrote a great article about that here: https://brandur.org/postgres-queues

◧◩
238. snarky+cH[view] [source] [discussion] 2021-06-12 14:58:21
>>before+9r
ELI5:

Redis is really a mix-bag of many useful commands, see [https://redis.io/commands]. It has been referred to as the "swiss army knife" because you can build many custom solutions from these building blocks. Still, its most common use-case is for in-memory key/value caching for performance reasons. For instance, if you have one database query that takes a particularly long time, you can execute it once and store the result in redis, and then retrieve that value extremely fast.

◧◩◪
242. welder+3I[view] [source] [discussion] 2021-06-12 15:07:08
>>rualca+f8
There's Redis-compatible alternatives that use disk space instead of RAM.

https://wakatime.com/blog/45-using-a-diskbased-redis-clone-t...

◧◩
250. pablod+EJ[view] [source] [discussion] 2021-06-12 15:20:23
>>mossel+14
For Elixir, Oban is a great library that allows the same, in case anyone was looking for one.

https://github.com/sorentwo/oban

◧◩◪◨⬒⬓
282. soroko+FS[view] [source] [discussion] 2021-06-12 16:46:20
>>kasey_+7I
"mongo said" - the obligatory mongo DB is webscale video:

https://m.youtube.com/watch?v=b2F-DItXtZs

◧◩
289. dereth+dU[view] [source] [discussion] 2021-06-12 16:56:32
>>nickjj+Tg
The elixir world has Oban[0] which implements quite a lot of advanced job features on top of PG. Admittedly it doesn’t quite have the usage of Celery and Sidekiq but most queueing libraries don’t.

[0] https://github.com/sorentwo/oban

◧◩
310. Nicksi+QZ[view] [source] [discussion] 2021-06-12 17:39:29
>>axiosg+o3
>They use packages such as celery which happen use Redis under the hood but the user doesn‘t have to deal with these details other than pasting some Redis connection URL somewhere.

Celery doesn't use Redis under the hood.

However, you may use Redis as a message broker and/or result store.

https://docs.celeryproject.org/en/stable/getting-started/bac...

322. fiznoo+t41[view] [source] 2021-06-12 18:13:30
>>hyzyla+(OP)
We’ve seen success in our current node.js application using pgboss [1] for job queueing. Since the primary database is Postgres, it’s nice to not introduce another dependency into the system, and take advantage of the infrastructure that we already have. The library supports most of what you’d expect from a job queue, and is crucially well maintained!

That being said, I agree with other comments that this is somewhat swimming against the tide. Redis is much more commonly used, and so if you don’t mind adding another service into the mix, I’d probably recommend going with Redis instead.

[1] https://github.com/timgit/pg-boss

332. gsvcla+Y81[view] [source] 2021-06-12 18:52:28
>>hyzyla+(OP)
I just did a thread on the topic myself https://twitter.com/dosco/status/1402909104012623873
333. gsvcla+591[view] [source] 2021-06-12 18:53:38
>>hyzyla+(OP)
And others like this thread on building a MongoDB like JSON database in Postgres in one line. https://twitter.com/dosco/status/1401413712842346501
335. gsvcla+991[view] [source] 2021-06-12 18:54:38
>>hyzyla+(OP)
And a search engine in Posgres instead of Elastic Search https://twitter.com/dosco/status/1400643969030127620
◧◩◪◨
341. rantwa+Hc1[view] [source] [discussion] 2021-06-12 19:28:02
>>blowsk+gr
no. he/she actually has a point. a database is not a queue

https://mikehadlow.blogspot.com/2012/04/database-as-queue-an...

347. didip+xd1[view] [source] 2021-06-12 19:36:55
>>hyzyla+(OP)
The issue with Redis is that it's distributed story is not great. I wonder if their Raft experiment is finally GA or not. https://github.com/RedisLabs/redisraft
◧◩◪◨⬒⬓
351. chx+ne1[view] [source] [discussion] 2021-06-12 19:44:18
>>bryanr+6K
mmmm no

They allow overbooking the flight but you can't book the same seat twice.

Overbooking can make financial sense -- even if you need to hand a little compensation from time to time it's still better than flying with empty seats. Of course it sucks big time for the travelers especially if there are no volunteers. IDB is miserable. https://www.transportation.gov/individuals/aviation-consumer...

◧◩
379. sigil+gp1[view] [source] [discussion] 2021-06-12 21:42:33
>>_ugfj+z2
Problems with this approach:

1. Recovery.

2. Concurrency.

3. Polling.

In more detail:

1. Recovery. Suppose a worker dies while processing a job. The job should be retried, but how do you know this, since expire > 0? You can impose a timeout, but that has drawbacks -- there might be long jobs you repeatedly start but never finish. To recover failed jobs without imposing a timeout, you'd have to run both the UPDATE and the job processing inside a transaction. That way, a failed job can result in an implicit ROLLBACK, freeing the job for future worker attempts.

2. Concurrency. So now recoverable workers are trying to claim jobs inside a transaction. If there are 10 jobs and 10 workers, we want all 10 jobs processing concurrently. However, each worker SELECTs the first item_id and tries to UPDATE that same row. 1 worker wins. The other 9 workers block while that transaction completes, then update zero rows. Concurrency will hover around 1.

3. Polling. In the transactional UPDATE approach, there's no way to tell whether there are free jobs short of trying to claim them via UPDATE, which blocks. So you must poll, and either burn cpu on quiet queues, or introduce an artificial delay into job processing times. The beauty of the SKIP LOCKED approach is that it can tell whether there are free (not currently processing) jobs. Even if you wake all 10 workers up via LISTEN / NOTIFY for a lone job, 9 workers in the thundering herd will fail to claim the job, and go back to sleep.

I blame TFA for not sufficiently explaining the SKIP LOCKED approach. A much better explanation is here: https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...

◧◩◪◨
383. subleq+Yq1[view] [source] [discussion] 2021-06-12 22:00:04
>>phumbe+0k1
https://github.com/gavinwahl/django-postgres-queue
◧◩
398. simpli+2C1[view] [source] [discussion] 2021-06-12 23:49:11
>>nickjj+Tg
Nowadays Rails has good_job[0], which lets you stick with Postgres for background jobs until you need more than a million-ish a day.

[0] https://github.com/bensheldon/good_job

◧◩◪◨⬒⬓⬔
401. lootsa+hG1[view] [source] [discussion] 2021-06-13 00:37:42
>>kyleee+DP
Reminds me of Starbucks does not use two phase commit. https://news.ycombinator.com/item?id=6229004
411. gregor+SU1[view] [source] 2021-06-13 03:52:36
>>hyzyla+(OP)
Remember when Circleci kept going down due to weirdness with their queued jobs in their database?

Read here https://status.circleci.com/incidents/8rklh3qqckp1

425. xdange+1o2[view] [source] 2021-06-13 10:33:43
>>hyzyla+(OP)
A great job queue for PostgreSQL running on Node.js https://github.com/graphile/worker

I've been very happy with it.

[go to top]