zlacker

[return to "Do you really need Redis? How to get away with just PostgreSQL"]
1. _ugfj+z2[view] [source] 2021-06-12 07:29:54
>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

◧◩
2. hughrr+gc[view] [source] 2021-06-12 09:18:43
>>_ugfj+z2
Oh please stop using databases as queues. I spent a disproportionate amount of time in the last 20 years undoing that decision. It doesn’t scale at all well.
◧◩◪
3. skytre+vj[view] [source] 2021-06-12 10:41:18
>>hughrr+gc
This please. I feel like "How to Get Away with Just PostgreSQL" and the GP comment falls squarely under being too preoccupied with whether you could, you didn't stop to think if you should.

Whatever happened to use the proper data structures for the job? PostgreSQL and MySQL are, at the end of the day, b-trees with indices. Throw in relational properties and/or ACID too. Those aren't properties you need or want in a queue structure.

I know I don't have a solid argument against not doing it; it's just experience (and dare I say, common sense) telling me not to. Not quite like parent but I spent the first two years of my professional career in a team that had the brilliant idea to use DBs as queues. The big task I partook in for that stint is moving them off that v2 into a v3 which used---wait for it---Redis. Everyone's quality of life improved with every migration, proportional to the size of the v2 cluster we retired.

◧◩◪◨
4. lolind+gF[view] [source] 2021-06-12 14:39:31
>>skytre+vj
What has me wanting to stick with postgres is that I work on a small team (two developers) and adding more technologies to our stack is extra overhead that's hard to justify. At our peak we're currently handling one request per second, and postgres for a queue is more than sufficient for that. Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows? Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?
◧◩◪◨⬒
5. skytre+KL[view] [source] 2021-06-12 15:43:10
>>lolind+gF
> Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows?

Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

Plus, as me and others keep saying, Redis (and other proper queues) isn't a complex addition to your infra. This isn't Hadoop, or Kafka, which is a very special type of queue (one way to put it, at least).

> one request per second, and postgres for a queue is more than sufficient for that

Yes I agree but...

> Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?

What I read when I see such statements is this mythical software engineering ideal that with enough abstraction, a migration is just a matter of writing a new class that implements some interface and then changing a config. For a sufficiently complex app infra, that happens almost never because you could never keep the abstraction leaks to an acceptable level.

Another thing, abstraction does not solve all your problems if the underlying implementation is poor fit to begin with. Let me paint you a plausible scenario:

Once you are large enough, you might find your PGQ acting weird and you realize it's because someone in the team wrote code that accesses your queue table like it's an actual table of records, not a queue. So you think, okay let's prevent that from happening. Maybe you add users and permissions to distinguish connections that need to access between tables proper and queue. Maybe you start writing stored procs to check and enforce queue invariants periodically.

Well, guess what, all those problems would've been solved for free if you invested maybe one work day getting a Redis server running when you were a two-person op serving one request per second.

Lastly, scaling a relational DB is an entirely different beast from scaling a queue. Scaling anything does not ever come painless but you can reduce the suffering when it comes. Would you rather scale PG so it can keep acting as a queue or scale a queue that's, you know, really a queue in the first place? Heck the latter might even be solvable by throwing money at the problem (i.e., give it more compute).

[1] Except for the part where you need to make money, of course.

◧◩◪◨⬒⬓
6. yongji+Hf1[view] [source] 2021-06-12 19:58:47
>>skytre+KL
> Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

I have to disagree. Of course code quality is important, but building things "properly" because "we may need it later" is a great way to kill a project with complexity. KISS, YAGNI. An early startup is, IMHO, not a good place to learn about new frameworks while getting paid - you're on borrowed time.

Make a back-of-the-envelope calculation about how much throughput you need. E.g., if you expect to have 10,000 users, and each may make one request per hour, you're dealing with 3 qps. Anybody who wants to bring in a new dependency for this, needs some talking to.

(If you already need Redis anyway and it's a better fit than Postgresql, then sure, go ahead.)

[go to top]