Do you really need Redis? How to get away with just PostgreSQL

>>hyzyla+(OP)
Funny. My approach is usually the other way around: Can I get away with just Redis?

replies(8): >>grenoi+L2 >>wokwok+T3 >>himinl+25 >>hardwa+wd >>siscia+Oi >>mianos+Ll >>dionia+bJ >>Jemacl+WU

>>hyzyla+(OP)
I've used PostgreSQL in the first two scenarios and would love an opportunity to in the third. It's worked really well and hasn't caused any problems under decent loads.

The one feature of Redis I'd love to have supported in PostgreSQL is to be able to set a TTL on a record. On a recent project where we could only save personal data for so long, it was a must have feature so we had to use Redis for that purpose instead.

replies(4): >>sologo+m1 >>phires+m2 >>halayl+xa >>kureik+5E

>>petepe+V
Wouldn’t a simple scheduled batch job be enough to go through periodically and drop records older than N days?

This would also give you audit logs, etc. As well as flexibility to adjust business logic without updating the TTL on all records.

replies(3): >>radica+G1 >>tbarbu+K5 >>petepe+gp

>>sologo+m1
It would, and you'd also get nice features like proper transactions and referential integrity.

BUT that comes at a cost. Doing that in Redis is just so easy, and the vast majority of the time no-one is going to notice a few concurrency errors, YOLO!

;-)

To be serious: it's a nice tool to have in the box and is amazing for inexperience/start developers because you can do a lot with very little time investment or knowledge.

replies(1): >>zigzag+77

>>hyzyla+(OP)
As German engineers are known to say: "Why make things simple when complicated will do?"

replies(2): >>Fannon+82 >>rualca+f8

>>arpa+R1
If you already have Postgres in your project and now you get the requirement for an additional cache store, I think it will be less complicated to reuse what you already have instead of adding another DB to your stack.

Of course, at some point of scaling needs a dedicated cache store will make sense anyway.

(Just some justifications from a german engineer :) )

replies(2): >>arpa+m3 >>mossel+V3

>>petepe+V
Redis EXPIRE doesn't actually delete any data after it expires though. Active deletion happens at random, so you can easily still have expired values in memory months later:

> Redis keys are expired in two ways: a passive way, and an active way.

> A key is passively expired simply when some client tries to access it, and the key is found to be timed out.

> Of course this is not enough as there are expired keys that will never be accessed again. These keys should be expired anyway, so periodically Redis tests a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

> Specifically this is what Redis does 10 times per second:

1. Test 20 random keys from the set of keys with an associated expire. 2. Delete all the keys found expired. 3. If more than 25% of keys were expired, start again from step 1.

So really it's not much better than doing `SELECT value from keys where key=? and expires > now()` with manual deletion. Though I agree that it can be more convenient.

replies(3): >>mateus+e3 >>yukino+K4 >>arpa+Y4

>>hyzyla+(OP)
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.

1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.

2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.

All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)

Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)

Of course, this is not the fastest job queue there is but it is quite often good enough.

replies(21): >>habibu+i3 >>0x0nya+u3 >>edwiny+fa >>cerved+Ra >>Twisol+Xa >>kalleb+8c >>hughrr+gc >>hardwa+oc >>pjmlp+5e >>wodeno+xh >>zapsta+Wm >>soroko+Lo >>ryanth+Vu >>dragon+gC >>virapt+lD >>joelbl+iI >>jayd16+oQ >>solips+MS >>ukd1+uj1 >>dillon+hl1 >>sigil+gp1

>>chmod7+K
My arsenal is Redis to SQLite to pg.

>>phires+m2
> Redis EXPIRE doesn't actually delete any data after it expires though.

I guess OP likes the simplicity that built-in expiration provides. In your example - all selects reading the value will need to have this expiration check. And also some scheduled process will have to be written to actually delete the values.

replies(1): >>jagged+Y9

>>_ugfj+z2
Right. And this article reminds me of people trying to use a SQL database like a filesystem. "Lock file before update otherwise it will get overwritten".

Don't use record lock as like it's an attribute of a row. Add a column on the table called "status" and use it.

>>Fannon+82
True. However, since cache is usually transient, adding this key-value store/cache is as easy as "docker run redis". No need to provision block storage, and it's really lightweight in comparison.

(that being said, I try really hard not to judge; after all, i'm not without fault: it's 2021 and i'm using bash over cgi-bin to serve web pages for my own hobby projects :))) )

replies(1): >>rualca+C8

>>hyzyla+(OP)
The point the author is missing is that most people are not deliberately choosing Redis.

They use packages such as celery which happen use Redis under the hood but the user doesn‘t have to deal with these details other than pasting some Redis connection URL somewhere.

replies(1): >>Nicksi+QZ

>>_ugfj+z2
I recently switched from rolling my own table to SQS.

The main issue is making a job queue dispatcher and runner which most people aren't familiar with.

>>hyzyla+(OP)
This seems a bit silly. We might be able to use only postgres, but this seems harder than using Postgres and Redis.

Any time someone says ‘this works for small workloads’ I hear, ‘we’ll eventually have to migrate this’.

That’s not a problem if you are talking about introducing a hugely complex piece of infrastructure, but Redis is hardly that.

replies(5): >>anagan+j4 >>ryanjs+R7 >>gizdan+Ua >>maniga+2N >>kristo+zo1

>>hyzyla+(OP)
Why do you need postgres, if you can get away with Redis? For all the 3 points mentioned in the article, I would rather just use redis as apposed to postgres. Unless, you really need a rds solution for the project.

replies(1): >>dkarra+n9

>>chmod7+K
Why would you choose to use a system that doesn't scale by default?

Single user local applications? Fair.

Web applications? Very strange choice imo.

Reddis is great, but it is *not* a database, and it's thoroughly rubbish at high load concurrency without clustering, which is (still) a massive pain in the ass to setup manually.

Of course, you can just use a hosted version off a cloud provider... but, it's generally about 10x more expensive than just a plain old database.

/shrug

I mean, sure, it's (arguably...) step up from just using sqlite, but... really, it's easy, and that's good... but it isn't good enough as a general replacement for having a real database.

(To be fair, sqlite has got some pretty sophisticated functionality too, even some support for concurrency; it's probably a step up from redis in many circumstances).

replies(4): >>tehbea+y4 >>arpa+a5 >>zigzag+p5 >>Hendri+8g

>>Fannon+82
Redis as a cache server requires different settings from redis as a job queue. So you can’t reuse the same server anyway.

>>hyzyla+(OP)
For rails I’ve used que in the past, which is a job queue adapter for Postgres. It supports activejob.

https://github.com/que-rb/que

replies(2): >>shaico+Eb >>pablod+EJ

>>Aeolun+z3
For thousands of applications deployed, they're most probably never need scale.

>>wokwok+T3
> Why would you choose to use a system that doesn't scale by default?

By all accounts Postgres seems to be a pain to scale off a single machine, much more so than redis.

replies(2): >>cbsmit+x8 >>hardwa+gd

>>hyzyla+(OP)
Hopefully one day projects like que[1][2] will become stable and battle-tested enough to use in a production environment. Until then I'll be using something like sidekiq (if you're going for a client-side job queue, eg: the clients don't really know about each other and only have rate-limiting, not true throttling).

With Postgres you also need to worry about high churn, especially since you are creating/locking/deleting rows constantly. This can be alleviated through a variety of means, of which personally I would use per-day table partitioning and truncate older partitions on a cron, not to mention the sharp increase in database connections to the host now required.

Ignoring the literal elephant in the room of synced writes to the store. Redis can be used quite effectively in a blocking manner with RPOPLPUSH/LMOVE(6.2+) for a reliable queue, allowing an item to not be lost because atomically the pop and push from two different lists are done together.

[1] https://github.com/que-rb/que [2] https://gist.github.com/chanks/7585810

>>phires+m2
I would contend that it really depends on what one would prioritize the most in that scenario. In my experience, Redis EXPIRE means it is not selectable. That is the primary requirement for a lot of development around EXPIRE/TTL. It is OK if it is still in memory in some form, it still won't be accessible by applications SDK or CLI. Since Redis 2.6 the expire error is from 0 to 1 milliseconds which is accurate enough for many use cases. Not to mention, Redis will handle that deletion for you. You don't need to run a deletion job and/or include an additional condition on a query.

Additionally, the expire/ttl/get/set in Redis is incredibly easy to use (and abuse, hence the OP article). Some team's criteria is limiting the amount of moving parts - and that's great. Don't use Redis and use a relational database for everything such as what you mentioned. Use it as a queue, a cache, a message broker, etc..

Other teams may care less about an extra moving part if it means their code will look simpler and they leverage relational databases for their more common usecases.

replies(2): >>nicobu+99 >>petepe+Ly

>>phires+m2
hmmm, I disagree that it's not better. Select operation implies index scan most likely with O(log n), while GET operation is essentially O(2-3). And you also have to run DELETE on sql to remove the expired keys.

Oh, and i'm not entirely sure about the part about redis active expiry (disabled by default, default is remove expired on lookup - lazy); you're talking about key eviction which applies to all deleted keys and AFAIR happens only when certain watermarks are hit. Since it happens in ram, it's also faaaast, unlike SQL DELETE, which will definitely involve disk...

>>chmod7+K
If your data has no value whatsoever, sure.

replies(2): >>gizdan+D8 >>cheald+Ia

>>wokwok+T3
redis is not a database. It's a key-value based cache. If you're using it as a database, you're gonna have a bad time.

replies(1): >>cube22+s6

>>wokwok+T3
I would say Redis with RediSearch is a database.

>>hyzyla+(OP)
Wordpress historically didn't need a cache and just used page visits and/or a cron job to kick off automated processes, backed by MySQL. Is it fast? Yes. Is it nearly as fast as Redis? No. Do you need it to be? Not for Wordpress lol

>>sologo+m1
Deleting large amount of data on Postgresql is expensive. First you need an index on a column to select the expired data, then you actually need to delete the rows which creates a lot of garbage, is heavy and slow.

Creating and deleting a lot of data on PG is a pain because of MVCC and vacuum. One useful trick is to partition data into tables and to truncate/drop entire tables, drop/truncate is instant and reclaims space immediately.

replies(4): >>plank_+T7 >>cbsmit+N8 >>sologo+O8 >>squigg+Ab

>>hyzyla+(OP)
Most of the times Redis's distributed lock works fine, however one should know that its not fail proof and you might run into really weird bugs

references:

0 - https://aphyr.com/posts/283-jepsen-redis

1 - https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

replies(1): >>antire+I7

>>hyzyla+(OP)
This is a great article because it outlines an alternative to using Redis for any given use cases. If we don't constantly re-evaluate our toolsets and their capabilities, it can lead to poor decision making.

That being said, I've found Redis largely a pleasure to work with for these use cases and don't really see a real incentive to changing my current approach.

>>arpa+a5
Why so? It has persistence and I'm not aware of any reported data loss happening with it.

It's also got loads of complex and useful instructions.

replies(2): >>arpa+O6 >>jasonw+e8

>>cube22+s6
Data loss can occur between flushes to disk, for example (by default every 2 seconds / every I_FORGOT megabytes). Perhaps (most likely) it is possible to fine-tune the configuration to have redis as a very reliable data store, but it doesn't come with such settings by default, unlike most of RDBMSes.

replies(1): >>miohta+S8

>>radica+G1
What's wrong with Redis' transactions?

>>avinas+L5
0 - unrelated to RedLock algorithm, since RedLock has its own replication in the protocol.

1 - I replied in a blog post refusing most of the arguments, and then there was maybe another reply. You may want to read the full thing to form an idea about the safety of RedLock.

>>Aeolun+z3
Or don't use Postgres to begin with. The reason you need to SELECT FOR UPDATE is because Postgres doesn't implement READ UNCOMMITTED. If you're implementing an ordered queue, you have no way of reliably getting the next page of queue items without taking a lock using SELECT FOR UPDATE because there may be pending uncommitted transactions that you need to block on. On MySQL and SQL Server you can compare counts computed with READ UNCOMMITTED and READ COMMITTED (on MSSQL, with snapshot isolation) to quickly figure out whether there are any uncommitted transactions in the range without taking a lock. That being said, SELECT FOR UPDATE and UPDLOCK equivalents are the easiest to get right, and if you need higher throughput you probably need a message queue or Kafka-equivalent.

>>tbarbu+K5
You can do the equivalent by adding a Timestamp column with index on your table and add “where now - timestamp <= TTL”. (Or some computational easier way that doesn’t require math on the query)

TTL on systems like Cassandra is pretty ugly and deleting data is hard as you scale no matter how you do it. I don’t think Postgres would be able to implement a TTL that is magically better than the rest.

replies(1): >>neziru+G9

>>hyzyla+(OP)
With Elixir you can replace both with something like Quantum or backed by Postgres with Oban. Or even just a hand rolled genserver.

You can also listen to Postgres WAL changes to build an Event like system with Elixir. Supabase is doing this with their Realtime app.

>>cube22+s6
Redis is inherently lossy as a matter of basic design, and that's not even touching on the many other issues born of NIH solutions rampant within it. You may not hit the behavior until you push real loads through it. If you talk to anyone who has, I'm confident they'll agree with the criticism that while it may be an excellent cache, it should never be treated as a ground truth database. It's excellent as a slower memcachd with richer features. It's not a database. You can also read Aphyr's reports over the years, which to be utterly frank, bent over backwards to be charitable.

>>arpa+R1
> As German engineers are known to say: "Why make things simple when complicated will do?"

Isn't Redis an in-memory key/value store? Paying for RAM is not the same as paying for SSD.

replies(1): >>welder+3I

>>tehbea+y4
Which PostreSQL scaling pain point would you be referring to? Citus?

>>arpa+m3
> True. However, since cache is usually transient, adding this key-value store/cache is as easy as "docker run redis". No need to provision block storage, and it's really lightweight in comparison.

If you're using postgres for caching only, as you do Redis, then you also do not need to provision block storage.

If you happen to already have Postgres running for other uses, you also do not need to provision block storage.

Finally, I would add that Redis clients such as Redisson are resource hogs that cause performance problems on apps, while pg clients are barely noticeable.

>>himinl+25
Why? Redis has persistent data stores, and doesn't necessarily need to be memory only.

replies(1): >>himinl+6e1

>>tbarbu+K5
I was going to say... the standard approach is to simply partition the data by time, at which point it gets really easy to manage.

>>tbarbu+K5
Sure, there are different design patterns for different data sizes. This also adds querying complexity, so just depends on what’s needed.

Also, most applications have peak and low periods of load that are predictable (e.g. users concentrated in a given set of time zones) which make for good times to run otherwise disrupting functions, etc.

>>arpa+O6
Not all use cases require reliable data storage and it is ok lose few seconds of data. Think simple discussion forums, internal chat applications. There are some scenarios where ease of use and a single server scalability pays off in the faster development and devops cost.

replies(3): >>arpa+a9 >>charro+Pn >>EvilEy+Cs

>>yukino+K4
You could very easily create a database view that applies the where query, and even prevent your db user from selecting from the underlying table.

You could also use a library like PG boss to handle the cleanup task.

>>miohta+S8
GP was asking why redis is not a reliable storage solution/database. Redis is great as an unreliable (not source-of-truth) storage.

replies(1): >>kwdc+4U

>>ram_ra+K3
I think the article is assuming you already have Postgres for data storage and arguing against the addition of redis when that is the case, for the given common requirements that typically calls for the addition of redis into the mix.

>>hyzyla+(OP)
For pub/sub, I would recommend against using PostgreSQL if you're doing it at any kind of scale because LISTEN ties up one connection completely and Postgres connections are very expensive compared to a redis connection.

replies(3): >>dvfjsd+H9 >>gunnar+7a >>postgr+O81

>>hyzyla+(OP)
Has anyone tested how listen/notify in pg (for in-memory databases) compares to pub/sub in Redis?

>>hyzyla+(OP)
What happened to RabbitMQ? Is that not used as a queue any more?

replies(1): >>lwn+jp

>>plank_+T7
Deleting whole partitions is generally useful strategy. It's like the difference between single inserts and batch inserts (often huge performance difference, and much lower IO)

Since you mentioned Cassandra and TTL, I'll mention ClickHouse, very nice TTL options, splitting into smaller partitions and using "ttl_only_drop_parts=1" has prove itself in the production with big data ingestion rates.

Last, but not the least, I almost always prefer Postgres for data storage needs, one can trust it to be safe and fast enough. Only some specific situations warrant other solutions, but it's a long way until that point (if ever), and better not optimize too early.

>>pilif+p9
I was looking for your comment here, thank you. I hope they solve this problem somehow in a future release.

>>mateus+e3
I would access the table through a view that had that query built into it.

create table all_items(id integer, value text, expires timestamp);

create index all_item_expiry on all_items(expires);

create view items as (select id, value, expires from all_items where expires > now());

Then you can treat items as your base table and postgres neatly allows INSERT/UPDATE/DELETE from it. You'll need a job to clean up expires < now() items but it can be done at whatever arbitrary interval you like, could even be a trigger in PG if you were feeling spicy.

>>pilif+p9
The other issue with LISTEN/NOTIFY is that a client will miss any notifications sent while it is not running (e.g. due to crash, update, etc.). An alternative would be logical replication and change data capture (CDC), as provided for Postgres by Debezium for instance. That way, any consumers won't miss events during downtimes, as they'll continue to read from the replication slot from the last offset they had processed.

Disclaimer: I work on Debezium

replies(5): >>superm+zb >>hardwa+Cc >>jbvers+Ne >>BiteCo+si >>Michae+tW

>>_ugfj+z2
Great! So I learned this technique. But one thing that's not clear to me - how a "expired" datetime field can allow easy retries? Anyone give me an example? Thanks!

replies(1): >>chx+Sb

>>petepe+V
you can do that using postgres portioning and a trigger to drop old partitions from the table.

>>himinl+25
Redis can be made durable. The WAIT command allows you to guarantee writes to a quorum of nodes, and it can be configured for on-disk persistence rather easily.

That said, due to it's single-threaded nature, blocking on quorum writes is likely to bottleneck your application under any kind of significant load. It really shines at volatile data, and while it can work for valuable data, there are better tools for the job.

replies(1): >>himinl+Gf

>>_ugfj+z2
no you don't but it's a lot better to do an atomic select/update IMHO.

Basically you have status = 0,1,2 for queued, processing, done. You have a filtering CTE that returns the queued rows, sort by id and update status = status +1 of the top one.

You can even get fancy and disallow updates on the status column for other users to make it immutable.

I've done this in SQL server in a very similar way to as explained in the post

>>Aeolun+z3
> Any time someone says ‘this works for small workloads’ I hear, ‘we’ll eventually have to migrate this’.

Personally I aim to develop with future migrations in mind. Certain abstractions are worth it. Put an interface around a cacher, db store etc. Day to day dev overheads is low, but your future self will thank you should you need to migratw away.

Sure this isn't always possible, but do it as much as possible.

>>hyzyla+(OP)
As much as I love Postgres, I would rather use Redis for this. I haven't used Redis much though, and on our project we actually decided on using Kafka. Admittedly much heavier and maintenance intensive, it seems to do the job very well.

Any opinions and experiences here with Kafka vs Redis as a queue?

replies(2): >>arpa+ib >>ickyfo+5z

>>_ugfj+z2
For other readers, the `UPDATE` step is an exact anlogue of the "compare-and-set" atomic instruction [0]. It's really cool to see how you've realized it in SQL!

As a capability, compare-and-swap has an infinite consensus number [1], meaning it's sufficient to implement wait-free consensus algorithms with an arbitrary number of participants. That makes it a perfect fit for managing a scalable pool of workers that need to coordinate on consuming from a queue!

[0] https://en.wikipedia.org/wiki/Compare-and-swap

[1] https://en.wikipedia.org/wiki/Consensus_(computer_science)#C...

replies(2): >>chx+hd >>halifa+Il

>>sgt+Va
It's apples versus oranges with rabbits in the middle.

replies(1): >>des1nd+mg

>>gunnar+7a
isnt this also the case for redis pubsub?

replies(2): >>philjo+Rc >>Nick-C+Uv

>>tbarbu+K5
> Creating and deleting a lot of data on PG is a pain because of MVCC and vacuum. One useful trick is to partition data into tables and to truncate/drop entire tables, drop/truncate is instant and reclaims space immediately.

In this case, the requirement is that user data must only be kept for a certain time and not longer.

If that time is a property of the record create and we're allowed to keep the data with an error of a day, I guess it's easy: We partition the table on a daily basis, and delete any partition that is older than 28 days old. Sometimes, a record will be closer to 29 days old when it's deleted, but we accepted that, and it's easy enough to write queries so that it's unavailable to the application if it's more than exactly 28*24*60*60 seconds old if that's our constraint.

If the requirement is to keep it based on the last of a certain kind of use, we'd need to move it from one partition to another if we need to keep it. For instance, if we can keep data for 28 days after the user last longs in, we can't just drop the partition - unless we've moved the user each day they log in.

If we have that kind of a constraint, where data lifespan is based on properties that change over the lifetime of the data, is partitioning + drop actually a useful approach? The drop will still be instant, I guess, but it's the movement of data over its lifetime that concerns me here.

replies(1): >>sologo+qe

>>mossel+14
Another alternative is GoodJob:

https://island94.org/2020/07/introducing-goodjob-1-0

https://github.com/bensheldon/good_job

>>hyzyla+(OP)
I think one of the draws of redis was, back when it came out, how simple it was to set up and maintain in comparison to an RDBMS.

>>edwiny+fa
Successful items are deleted from the queue before they expire. So an item with expired > current time needs to be retried because it failed processing. So, a cronjob resets these to expire 0 and the next worker can pick them up.

replies(4): >>edwiny+Nj >>layout+hk >>ptx+5l >>johbjo+rT

>>_ugfj+z2
The way we did it (in MYSQL) was "UPDATE tasks SET processid=<unique id for the queue consumer task> WHERE processid IS NULL ORDER BY taskid LIMIT 1" and then look for processid to find your task. I guess the LIMIT on UPDATE might be an mysql extension? Then if your task crashes or dies you can look for invalid processids and do cleanup

replies(1): >>chx+Ic

>>hyzyla+(OP)
Oh but redis is much more than that.

It's so simple any little script can interact with it in seconds, instead of having to craft complex SQL or import your wrapper. You can call redis from the weirdest places, like from the inside of an nginx config file. You needn't even a compiled driver if that matters.

It's easier to get it to perform. It's possible to get great perfs with PostgreSQL, but you just got them for free with redis. Very hard to screw up. Read, write ? N+1 ? Batching ? Who cares, it will be fast no matter the concurrency or the load.

Sure you can expire with Postgres, but having done so in practice, it's way harder than it looks to get right. With redis, you don't have to care. Set a cache, set a lock with a timeout, store data your are not sure you need. It will all disapear.

Redis is not just key/value. You have sets, so ensuring unicity is a piece of cake, no constraints to define, then insertion to check against. And of course, you have sorted sets, which you are kept ordered at insertion by any number you pair the value with, such as a timestamp/score/index, and truncate by range for extra magic.

And then you have bonuses like hyperloglog which you need an extension for in Posgres. Super handy for stats.

Finally, you have streams, which for most apps will fill the gap for a timeserie database or a way to store your app logs. All that with a 2ms latency at 10k/s requests. None of my projects ever needed more than 1K/s though, even one with 1M users/month.

You have all of that with a dead simple install and basically no maintenance.

In fact, redis by itself consume almost no resource, it's not even worth it to not have it in your toolset. I just install it by default on all my projects: I never regretted it, there is always something it can do for you. It not now, just wait a bit, it's costing you nothing, and something will come up.

So no, let use Postgres for what it's great at, which is being a robust all purpose database. Redis is a fantastic complement to it, not a competitor, just use both.

Unless you are google size, there are little chances you will reach a stage where you need to migrate from any of them.

It's part of those tech that are just too good to be true, like SQLite or Python.

My only regret is that it doesn't exist on windows.

P.S: if you need caching and can't afford redis on a small python script, use Disk Cache, it's awesome: http://www.grantjenks.com/docs/diskcache/index.html

replies(1): >>junon+8n

>>_ugfj+z2
Oh please stop using databases as queues. I spent a disproportionate amount of time in the last 20 years undoing that decision. It doesn’t scale at all well.

replies(15): >>ngrill+Kc >>aidos+ri >>skytre+vj >>mianos+8l >>cerved+vn >>basilg+Bo >>blowsk+gr >>baq+9B >>atombe+fB >>vincen+sC >>crazyg+iU >>onlyre+yV >>maxrev+w01 >>sontho+ff1 >>holodu+g22

>>_ugfj+z2
Don't forget SKIP LOCKED[0]!

[0]: https://blog.2ndquadrant.com/what-is-select-skip-locked-for-...

replies(2): >>maskli+Bi >>Koffie+3j

>>gunnar+7a
Could you explain a little more about this (Debezium is awesome btw)? So after creating the proper replication slots, you set up change data capture via Debezium, then you listen to the CDC stream from the original DB you are connected to, correct?

This sounds like it might be vulnerable to dead consumers causing WAL to pile up without the right settings/periodic checks.

replies(1): >>gunnar+OX

>>kalleb+8c
Our initial did that too but generating a unique was removed soon after as had expire in there anyways for retries as I mentioned and that is enough.

>>hughrr+gc
Look at SKIP LOCKED in PostgreSQL and MySQL.

replies(1): >>hughrr+ud

>>superm+zb
I'd need to check, but I think so.

Which is probably why if you don't want loss if consumers go down a proper queue system (RabbitMQ, ActiveMQ, Amazon SQS, Kafka if you don't care about ordering between partitions) is the way I'd go.

>>hyzyla+(OP)
I imagine most people using Redis as a queue were already using it as a cache and just needed some limited queuing ability. Much like how places end up using a DB as a queue.

Using a DB as a queue has been a thing for a very long time. Every billing system I've seen is a form of a queue: at a certain point in the month a process kicks off that scans the DB and bills customers, marking their record as "current".

The challenge is always going to be: what if the worker dies. What if the worker dies, the job is re-ran, and the customer is billed twice. Thank god it's been many years since I've had to touch cron batch jobs or queue workers. The thought of leaving the office knowing some batch job is going to run at 3am and the next morning might be total chaos... shudder.

replies(1): >>cerved+pp

>>tehbea+y4
Postgres is not as automatic as other tools but is mostly an artifact of it being around so long, and focus being on other things. Few projects have been around and stayed as relevant as postgres.

Most of the time, you really don't need to scale postgres more than vertically (outside of the usual read replicas), and if you have tons of reads (that aren't hitting cache, I guess), then you can scale reads relatively easily. The problem is that the guarantees that postgres gives you around your data are research-level hard -- you either quorum or you 2pc.

Once you start looking into solutions that scale easily, if they don't ding you on performance, things get murky really quick and all of a sudden you hear a lot of "read-your-writes" or "eventual consistency" -- they're weakening the problem so it can be solved easily.

All that said -- Citus and PostgresXL do exist. They're not perfect by any means, but you also have solutions that scale at the table-level like TimescaleDB and others. You can literally use Postgres for something it was never designed for and still be in a manageable situation -- try that with other tools.

All that said, KeyDB[0] looks pretty awesome. Multithreaded, easy clustering, and flash-as-memory in a pinch, I'm way more excited to roll that out than I am Redis these days.

[0]: https://github.com/EQ-Alpha/KeyDB

replies(1): >>victor+3r

>>Twisol+Xa
Yes, the UPDATE command is the exact equivalent of LOCK CMPXCHG (the SELECT can be seen as computing the memory address). So the discussion about that in the comments of https://stackoverflow.com/a/59022356/308851 totally applies: if two queue runner threads pick the same item exactly one will succeed so it can't happen both tries and tries the same item. So there's no busy wait (reminder: a busy wait is where it does nothing just tests a condition), it just goes over every candidate until one succeeds.

replies(1): >>abhish+2G

>>ngrill+Kc
It’s not that. It usually ends up in the same storage engine as the main business functionality which leads to IO contention. The some sick and twisted bastard does a join to it from somewhere abhorrent which means it requires some serious heavy lifting to fix it.

This is unfortunately the status quo in the slightly less ideal universe than “seemed like a good idea at the time”

Use SQS or RabbitMQ or something.

Edit: Also if you’re using something SQL for queues you’re going to have to build out your monitoring stack yourself based on your implementation rather than suck up a metrics endpoint with queue stats or pull from cloudwatch.

replies(3): >>jbvers+te >>glogla+Ii >>flefto+Bj

>>chmod7+K
Hey I assume this is like a joke and not too serious, and we'd all switch off when things got a bit hairy, but I sure hope other readers can tell.

I am literally in the middle of digging a company out of this mistake (keeping Redis too long) right now. If your software/data is worth something, take a week or a month and figure out a reasonable schema, use an auto-generation tool, ORM, or hire a DB for a little bit to do something for you. Even MongoDB is better than redis if your're gonna do something like this.

replies(2): >>tluybe+Tk >>chrisd+Rl

>>_ugfj+z2
Yeah, most of the NoSQL solutions boil down to not learning how powerful SQL and its programming extensions are.

replies(2): >>rad_gr+Jo >>kasey_+hq

>>hyzyla+(OP)
A blog post series I've been meaning to write for over 3 years now:

* Every database a Postgres 1: Key/Value store

* Every database a Postgres 2: Document stores

* Every database a Postgres 3: Logs (Kafka-esque)

* Every database a Postgres 4: Timeseries

* Every database a Postgres 5: Full Text Search

* Every database a Postgres 6: Message Queues

Low key, you could make almost every single type of database a modern startup needs out of Postgres, and get the benefits (and drawbacks) of Postgres everywhere.

Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

If anyone from 2ndQuadrant/Citus/EDB see this, please do a series like this, make the solutions open source, and I bet we'd get some pretty decent performance out of Postgres compared to the purpose built solutions (remember, TimescaleDB did amazing compared to InfluxDB, a purpose built tool, not too long ago).

New features like custom table access methods and stuff also shift the capabilities of Postgres a ton. I'm fairly certain I could write a table access method that "just" allocated some memory and gave it to a redis subprocess (or even a compiled-in version) to use.

[EDIT] - It's not clear but the listing is in emacs org mode, those bullet points are expandable and I have tons of notes in each one of these (ex. time series has lots of activity in postgres -- TimescaleDB, native partitioning, Citus, etc). Unfortunately the first bullet point is 43 (!) bullet points down. If someone wants to fund my yak shaving reach out, otherwise someone signal boost this to 2Q/Citus/EDB so professionals can take a stab at it.

[EDIT2] - I forgot some, Postgres actually has:

- Graph support, w/ AgensGraph now known as AGE[0]

- OLAP workloads with Citus Columnar[1] (and zedstore[2]).

[0]: https://age.apache.org

[1]: https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...

[2]: https://github.com/greenplum-db/postgres/tree/zedstore

replies(3): >>rkwz+gf >>des1nd+Ag >>osigur+SZ

>>squigg+Ab
You’d probably already have indices by user then. There also other ways to store the data, for example not as a time series of user actions, but as another data structure. It just depends on your requirements and data model.

The gist of the original article is asking whether you could reduce tech stack complexity and use a single set of technologies for more use cases, allowing to understand the tool you are using better.

Also, note that a traditional database may or may not be the right tool for the job - there are different storage/behavior needs for writing lots of data, processing lots of data and serving the results.

replies(1): >>squigg+Lr

>>hughrr+ud
I dunno. If it’s usually the same problem, it should be easy to identify and fix.

Unless we’re talking about a high load, there should be no problem doing this.

replies(1): >>hughrr+xf

>>gunnar+7a
This is by design and very common with pub/sub.

>>hyzyla+(OP)
For use case 1 (job queue) I can only recommend beanstalkd[1]. Simple tcp plain text protocol. Install and forget. Plus you get some really useful features like deferred jobs, burying, kicking, job priorities, etc.

We have literally processed tens of billions of jobs without a single failure.

Old and simple technology that just works

[1] https://beanstalkd.github.io/

replies(1): >>mianos+tm

>>hardwa+je
> Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.

It's also useful when you want to quickly build a "good enough" version of a feature like search so you can get it in front of your users fast and iterate on their feedback. Most of the time, they'd be quite happy with the results and you don't have to spend time on something like managing Elasticsearch.

I wrote a post on how you can use postgres to add search capabilities with support for queries like

jaguar speed -car

ipad OR iphone

"chocolate chip" recipe

http://www.sheshbabu.com/posts/minimal-viable-search-using-p...

replies(1): >>hardwa+tf

>>rkwz+gf
Yup -- I'm a big fan of always writing the interface and the implementation, even if there's only one. You're always glad you wrote `Queue` and `PostgresQueue` when it comes time to write `KafkaQueue` or `NATSQueue`.

That said, I am an unrepentant yak shaver, and there is a lot to be said in just writing those things when you need it but, Postgres would be perfect for rapid prototyping in this way.

>>jbvers+te
It’s only the same problem until someone dumps 25,000 entries in your queue.

replies(2): >>kasey_+8o >>crazyg+3S

>>cheald+Ia
> Redis can be made durable

Postgres, SQLite and many others are durable by default. Almost all so-called databases are like that. When you need a database, 90% of the time, you want durable. People make mistakes, developers are people, developers make mistakes, and one such mistake is assuming that Redis is like other databases in being durable by default when it's not. It's not conjecture, I've seen it done in production.

replies(1): >>YetAno+sy

>>wokwok+T3
> sure, it's (arguably...) step up from just using sqlite

How so? What‘s wrong with SQLite?

replies(1): >>wokwok+7p

>>hyzyla+(OP)
Don't forget to use Partial Indexes on jobs/tasks tables with queries like WHERE status = 'pending'

More: https://use-the-index-luke.com/sql/where-clause/partial-and-...

>>arpa+ib
Ha ha. So true. In theory though you can make any Turing complete piece of software to do anything, so rabbits can be turned into oranges...

>>hardwa+je
But I don't get it, why would you use PG for all these if specialized systems (and arguably optimized for that use case) already exist?

replies(5): >>cnorth+Pg >>konsch+bh >>hardwa+zh >>jpalom+Li >>Gordon+jL2

>>hyzyla+(OP)
Does groupcache for golang works with cloud run ? As a redis replacement ?

>>des1nd+Ag
Operational ease

>>hyzyla+(OP)
I think one of the biggest advantages of using Redis for job queing vs Postgres comes down to library support.

For example Python has Celery and Ruby has Sidekiq. As far as I know there's no libraries in either language that has something as battle hardened with comparable features for background tasks using Postgres as a backend.

There's a big difference between getting something to work in a demo (achievable by skimming PG's docs and rolling your own job queue) vs using something that has tens of thousands of hours of dev time and tons of real world usage.

I'm all for using PG for things like full text search when I can because it drastically reduces operation complexity if you can avoid needing to run Elasticsearch, but Redis on the other hand is a swiss army knife of awesome. It's often used for caching or as a session back-end so you probably have it as part of your stack already. It's also really easy to run, uses almost no resources and is in the same tier as nginx in terms of how crazy efficient it is and how reliable it is. I don't see not using Redis for a job queue as that big of a win.

replies(5): >>RedShi+ph >>kortex+7U >>dereth+dU >>trulyr+lY >>simpli+2C1

>>des1nd+Ag
Because you already have a Postgres DB running probably and you know how to back it up, you know how to upgrade it, all your services can already authenticate towards it, your developers can run it locally, you know how to mock it…

>>hyzyla+(OP)
yes, do you really need PDF convertors when you can have them as PostgreSQL extensions.

the point (ok, one point of many) of REDIS is that it is not the main DB so you can have a sense of security and decoupling in the architecture. Besides - there is no silver bullet for all things. While you can have your app do everything with PostgreSQL (and much more with Oracle, something people dislike it about), the fact itself does not mean is a good design decision or is a more stable decision.

Because when you have redis for sessions, kafka for event streams, postgre (or else) for data storage - you have components that can fail separately and thus the system degrades gracefully.

replies(1): >>xupybd+7i

>>nickjj+Tg
But in Postgres you could write functions in your schema to handle job queueing/dequeueing with the additional benefit of being able to use it in any language that can connect to Postgres and being able to reuse the same SQL/interface across all languages.

replies(3): >>nickjj+Gi >>c-cube+xt >>rantwa+kd1

>>_ugfj+z2
> SELECT item_id WHERE expire = 0

I come across a lot of SQL databases that uses 0 instead of false.

Is there any good reason for this?

replies(3): >>xupybd+mi >>maskli+yi >>chx+Ni

>>des1nd+Ag
Just repeating what others have said:

- Postgres is probably already running (it's pretty good for OLTP workloads)

- Operational ease and robustness

- Cloud support everywhere for Postgres

- People know how to backup and restore postgres

- Sometimes Postgres will beat or wholly subsume your specialized system and be a good choice

- Postgres has ACID compliance and a very good production-ready grasp on the research level problems involved in transactions. I've never met an etcd/zookeeper cluster I didn't wish was simply a postgres table. Image being able to actually change your cache and your data at the same time and ensure that both changes happen or none of them happen (this is a bit vaporware-y, because locks and access pattern discrepancies and stuff but bear with me). You're much more unlikely to see Postgres fail a Jepsen test[0]

[0]: https://jepsen.io

>>larodi+dh
Yes but have you built a tool for the job or a Rube Goldberg machine?

Complexity comes at a huge cost. Only add it when the benefits out weigh the costs.

You could start out building a product that could scale to millions of uses overnight. But if you do that you've spent months building something with no users. You could have been in the market already, building revenue already. Your requirements will change as you find your market fit and you'll need to change things. The less you have built the easier it is to change. Why not leave the million user capabilities until you actually need it?

replies(1): >>berkes+ik

>>wodeno+xh
Only in databases that have no Boolean type.

Sometimes it's an artifact of the orm mapping.

>>hughrr+gc
The solution I use now is to have the queue in the dB and then have a single process pushing to another queue better suited to dealing with workers.

I drank from the “keep your queue out of the dB” koolaid once, but then you end up worrying about transaction boundaries and atomic writes to the queue and the dB. It totally depends on your workload, but in my case, I’ve found the halfway solution of the queue in the dB so you get to write to it within your transactions, with a dedicated queue for the workers to be a much better solution.

replies(2): >>hughrr+6l >>AtNigh+Dq

>>gunnar+7a
Then use streams.

>>wodeno+xh
So that you don't need a separate column for an associated datum?

>>hardwa+oc
`FOR UPDATE` is also a good idea.

>>RedShi+ph
I'm totally with you in this regard and I'd like to see that too but the reality of the situation is a job queue is more than slapping together a few SQL queries.

A good job queue will have all or most of these features:

    Prioritize jobs (queues, weighs to certain jobs, etc.)
    Scheduled jobs (running them once but X time in the future)
    Periodic jobs (running them every 2nd Tuesday at 3:33am)
    Static and dynamic configuration (CRUD'ing jobs at runtime, like adding new scheduled tasks)
    Re-try jobs (customizable strategy, such as exponential back off)
    Rate limit jobs
    Expire jobs
    Cancel jobs
    Unique jobs
    Batch executing
    Handling graceful shutdown (integration with your app server)
    Get metrics (status, health, progress, etc.)
    Browse job history
    Web UI (nice to have)

And I'm sure I'm missing things too. These are only off the top of my head based on features I tend to use in most applications. In other words, this isn't a laundry list of "nice to haves in theory", most of these are core or essential features IMO. I use them in nearly every web app.

Rolling all of these things on your own would be a massive undertaking. Tools like Celery and Sidekiq have been actively developed for ~10 years now and have likely processed hundreds of billions of jobs through them to iron out the kinks.

Even if you managed to do all of that and created it as a Postgres extension (which I think is doable on paper), that's only half the story. Now you'd have to write language specific clients to interface with that so you can create jobs in your application using a nice API. This would be a very welcome project but I think we're talking a year+ of full time development time to release something useful that supports a few languages, assuming you're already an expert with writing pg extensions, have extension knowledge about job queues and know a few popular programming languages to release the initial clients.

>>hughrr+ud
The benefit of having the queue in the same engine as the main business functionality is that you can do stuff in a single transaction.

Having the queue on its own means you have to handle exceptions like "job is done but queue didn't get updated" or get into distributed transaction which is very different challenge.

Also, operationally having one inherently stateful component (db) versus two (db, queue) makes things a lot simpler to manage, deploy, update, etc.

Not claiming this is a perfect solution, just that the tradeoff isn't obvious.

replies(1): >>hughrr+ll

>>des1nd+Ag
One practical thing is that consistent backups can become very difficult if you distribute your state to multiple places.

>>wodeno+xh
false would be "expired", this is called "expire" because it is a timestamp, it's the epoch time when the item expires. The worker presumably finishes before that time and if it does then the item is deleted. So, if any item has a nonzero expires in the past then it needs to be reset to 0 so it can be retried.

>>chmod7+K
It was the reason with https://zeesql.com formerly know as https://RediSQL.com was written.

My clients seems rather happy with the project.

>>hardwa+oc
The article mentions that.. The comment by chx was rather a more generalized idea for implementing a queue that is not database-specific.

>>hyzyla+(OP)
How about tags/sets?

>>hyzyla+(OP)
MySQL and Microsoft SQL server also support SKIP LOCKED.

>>hughrr+gc
This please. I feel like "How to Get Away with Just PostgreSQL" and the GP comment falls squarely under being too preoccupied with whether you could, you didn't stop to think if you should.

Whatever happened to use the proper data structures for the job? PostgreSQL and MySQL are, at the end of the day, b-trees with indices. Throw in relational properties and/or ACID too. Those aren't properties you need or want in a queue structure.

I know I don't have a solid argument against not doing it; it's just experience (and dare I say, common sense) telling me not to. Not quite like parent but I spent the first two years of my professional career in a team that had the brilliant idea to use DBs as queues. The big task I partook in for that stint is moving them off that v2 into a v3 which used---wait for it---Redis. Everyone's quality of life improved with every migration, proportional to the size of the v2 cluster we retired.

replies(4): >>zigzag+Hn >>lolind+gF >>Seattl+kV >>e12e+YQ1

>>hughrr+ud
So use a separate database.

Is contention with the application your only objection? That’s pretty weak.

What’s the real compelling reason not to use a database? You haven’t said.

replies(1): >>hughrr+Zk

>>chx+Sb
Thanks for the explanation and time is needed in order to digest it... I guess one might start with the simple "status" field approach ;)

replies(1): >>Dangit+uN

>>chx+Sb
Doesn’t sound like a robust solution. What if the job takes longer than your estimated expire time? Does the worker keep updating the expiry date on a background thread while doing the processing?

replies(1): >>cbo100+cs

>>xupybd+7i
> Yes but have you built a tool for the job or a Rube Goldberg machine? > Complexity comes at a huge cost. Only add it when the benefits out weigh the costs.

Im honestly unsure if you mean this as opposing "doing everything in Postgres" or as opposing "throw more services on the stack".

Because both are true for the statements. You have that complexity, regardless of where you implement it. Either you are building the rube-goldberg machine inside of postgres out of modules, config and SQL or outside of postgres with additional services.

The only way to really solve this is to avoid building that RG machine in the first place. In this case: don't have queues. In practice that probably means introducting complexity elsewhere, though.

replies(1): >>xupybd+Ge1

>>hardwa+wd
> take a week or a month and figure out a reasonable schema, use an auto-generation tool, ORM, or hire a DB for a little bit to do something for you.

Sorry but am I the only one who is very worried about the state of software? There are people who drank so much of the schemaless (which was not an actual issue for any dev worth her salt to begin with) that you have to dispense this kind of advice? I find that bordering on criminal if someone did that to you and carries the title programmer.

Again, maybe that is just me.

Edit: not an attack on the parent: good advice. Just didn't know it was that bad. And sad.

>>flefto+Bj
Contention, architectural separation, you have to build your own monitoring stack, not transactional with concerns outside the database without introducing distributed transactions and risk, no routing or distribution capability, you have to build it yourself, point in time message durability is somewhat dubious depending on your acknowledgement process which of course you had to invent yourself as well.

Not to mention numerous database administrators crying into their cups of coffee.

Enough?

Like I said 20 years of people fucking this up is my experience.

replies(2): >>flefto+Ml >>deskam+0u

>>chx+Sb
You mean expired < current time, right?

>>aidos+ri
Are you sure it works properly when something fails. That’s a wonderful situation I like to ask people because the answer is usually no. Then someone gets billed twice for something and then the guy who said this was a good idea gets shot.

replies(1): >>aidos+Nm

>>hughrr+gc
Skype at its peak used a postgres queue extension, pgqueue, at its core. Seemed to go OK for them. The extension did work around some limitations when running a pure sql queue in postgres. These days I would run Pulsar but if you only have a million users and already have postgres you will be fine.

replies(1): >>conrad+La1

>>glogla+Ii
The point of queues is sequencing, storage and decoupling of events for the sake of scalability and durability.

By putting it in the same transactional store with the same transactional boundaries you’re instantly shooting the whole point.

Not only that, most queues define boundaries between different logical systems where transactional boundaries do exist. At which point your database’s transaction scope is extremely limited in capability.

In the real world of messaging transactions span more than just the database engine.

It’s just bad architecture. Full stop.

replies(2): >>victor+mo >>dragon+OC

>>Twisol+Xa
Postgres’s transactional semantics are really useful when building a queue, because of how it interacts with the various pieces.

Connection 1

  LISTEN 'job-updates';

Connection 2

  BEGIN;
  INSERT INTO jobs ('a-uuid', …);
  SELECT PG_NOTIFY('job-update', 'json blob containing uuid and state change info');
  COMMIT;

Connection 3 (used when Connection 1 is notified)

  BEGIN;
  SELECT id, … FROM jobs WHERE id = 'a-uuid' FOR UPDATE SKIP LOCKED;
  UPDATE 'jobs' SET state = 'step1_completed' WHERE is = 'a-uuid';
  SELECT PG_NOTIFY('job-update', 'json blob containing uuid and state change info');
  -- do the thing here: computation, calling external API, etc. If it fails then rollback.
  COMMIT;

Because notify has transactional semantics, the notify only goes out at transaction commit time. You want to use a dedicated connection for the notify.

The only downsides I immediately think of are you will have every worker contending to lock that row, and you’ll need to write periodic jobs to cleanup/retry failures.

>>chmod7+K
If you want to use transactions for multi table updates you are probably best to use a proper rdbms. Not to mention read consistency. If you only have one table Redis may be fine. I usually find my work grows beyond one table. Redis does make an unbeatable cache.

>>hughrr+Zk
Meh. All your objections are hand wavey and not factual.

Databases actually work fine as a queue but emotionally you don’t like it. That’s fine it’s just not real strong objections.

What you have not said is “it physically does not work”, and that’s because it does work fine.

replies(3): >>hughrr+im >>philda+Jm >>rmetzl+hr

>>hardwa+wd
If you store protos in your Redis keys (like most people using “NoSQL” for data storage), this comment doesn’t have much punch. Pretty sure we all can think of some pretty high profile examples of NoSQL + structured data working very very well at scale.

replies(1): >>hardwa+Rm

>>flefto+Ml
I’m not saying it doesn’t work. I’m saying you’re shooting your toes off down the line.

>>fasteo+Ue
It is funny, every time I have used beanstalk over the years I have put it in thinking I'll replace it later. I never have. Multiple startups I have left years ago are still running it, untouched years later. No one mentions it probably because it didn't do anything but run. I wish the same could be said for RabbitMQ.

replies(1): >>Gordon+lM2

>>flefto+Ml
I don't think he's saying 'it cannot be physically made to work (given enough effort and discipline)'. The impression I got was more like 'if you go down this route then the incremental path of least resistance leads somewhere bad'.

>>hughrr+6l
Well, I only ever enqueue idempotent work, so again, in my scenario it can fail and be rerun, but I’m also unsure about what you think is a bad idea here.

Honestly, I had more code managing two phase commits so I would only push to the queue when I was sure (ish) the dB commit was ok.

Say you need to trigger something to send an email, but let’s say that there’s an error condition after and the transaction is rolled back. How do you handle that failure? In my scenario the queue is rolled back too, no harm to foul.

>>chrisd+Rl
I'm not trying to get on people who are using redis as a cache (for photos, or any other ephemeral data).

The idea I was trying to get at was using redis to store data traditionally reserved for OLTP workloads.

> Pretty sure we all can think of some pretty high profile examples of NoSQL + structured data working very very well at scale.

Well that's the thing, you very rarely hear of companies who cursed their decision early on to use NoSQL when they realized that their data was structured but in 20 different ways over the lifetime of the product. Some datasets only need light structure (key/value, a loosely defined document, schema-included documents), and other things should probably have a schema and be stored in a database with a tight grip on that schema and data consistency/correctness. Please don't use redis in that latter case.

replies(2): >>chrisd+zn >>dionia+pJ

>>_ugfj+z2
I have done something similar to this to implement queues in Postgres. And at quite a large scale with lots of workers all pulling items off the queue at once.

One huge advantage over, say, SQS, is that you also get easy visibility! You can query and see what’s been completed, what’s still not worked on, etc.!

Oh, and do you want a priority queue? Just add a priority field and sort before picking an item! Do you need to change the priority of an item while it’s already in the queue? Go right ahead. Do you want to enforce some constraints so that duplicates don’t get added to the queue? It’s a database — add the constraints.

If you’re already using a relational database, and now you need a queue, start with this approach. Odds are, it’ll work just fine, and it’ll give you the perks above.

replies(1): >>Gordon+PV

>>BiteCo+dc
Not to mention I've never personally experienced redis go down by itself - it's always some stupid user error that could have happened to any part of the infrastructure.

Redis is, far and away, _the_ most robust piece of software I've ever had the pleasure of deploying to production.

replies(1): >>snicky+Ro

>>hughrr+gc
In your experience, roughly at which point did it stop scaling and why?

replies(2): >>rmetzl+Zq >>hughrr+LB

>>hardwa+Rm
I mean, Google was built on protos in a “NoSQL” database (BigTable). I think maybe you are overindexing on personal experience.

replies(1): >>hardwa+3q

>>skytre+vj
> Whatever happened to use the proper data structures for the job?

This so much. People too often treat databases as magical black-boxes that should handle anything. Database is most often the bottleneck and choosing the proper storage engine with appropriate data structures can be 100x more efficient that just using the defaults. 1 server vs 100 can definitely make a noticeable difference in costs and system complexity.

While premature optimization is bad, choosing the right tool for the job is still somewhat important and will usually pay off in the long run.

replies(1): >>mirekr+Rs

>>miohta+S8
I work on a SaaS community forums service and I can assure you data loss is not acceptable to our clients.

As a result we use MySQL w/ memcached, although we are considering a swap to redis for the caching layer.

>>hughrr+xf
I’ve seen all manner of hokey queue implementations in sql going back about 20 years and all of them could handle 25k enqueue bursts. That wasn’t a problem for a Sybase database on a commodity host circa 2000.

I think if I were going to argue against using DBs as queues it would be around: heavy parallel write use cases, latency concerns of both reads/writes and scaling to millions of events per second.

If you don’t have those concerns using a properly normalized and protected schema (which you are doing anyway right? Cause if not you are already shooting your toes off) for queues goes a very long way and removes a very big operational burden and tons of failure modes.

replies(2): >>justso+gu >>hughrr+8D

>>hughrr+ll
Having a separate queue makes sense across application boundaries. If you need queuing mechanisms in your own application you should be able to get by with a dB for 99% of use cases

>>hughrr+gc
For the scale of most tasks, the existing RDBMS that a small app is using for its data is perfectly sufficient for the queuing needs of said app. This reduces complexity in the app and removes the need to maintain additional tools.

Frameworks like Laravel make this easy because they allow you choose different backends for your DB, queue, and cache, among other things, all or some of which can be an RDBMS like Postgres or MySQL/MariaDB.

When your app's need outscale what the RDBMS can rationally handle, then you can dedicate the resources to switch to a more specific tool as your growth has indicated you should.

I can't say what the percentage of apps that can handle things this way, but the anecdotes mentioned in this thread are clearly the cases where growth outstripped what a traditional RDBMS like Postgres are well-suited for, and so migrating was the right thing. The probably was likely identifying the need or delaying the transition until it was too painful. I don't think there's a magic bullet here, but I also don't think that starting with a dedicated queuing store tool is also always the right thing to do.

>>pjmlp+5e
How so?

>>_ugfj+z2
What happens if the process that performed 2. crashes before it was able to complete whatever processing it was supposed to do?

replies(2): >>jffry+ps >>jayd16+yQ

>>junon+8n
That + the best documentation in the industry. It is a joy to work with.

>>Hendri+8g
I suppose it's a bit more suitable to networked services than sqlite is, since it's natively a web api, and sqlite is natively a local-only solution.

...but, I started writing about clustering and the network API, but, I can't really articulate why those are actually superior in any meaningful way to simply using sqlite, and given the irritation I've had in maintaining them in production in the past...

I guess you're probably right. If I had to pick, I'd probably use sqlite.

>>sologo+m1
The requirement for the project was that the deletion should be handled in as fail-safe a manner as possible. Relying on external jobs, cron, other scheduled tasks etc were deemed a risk.

We were already using Redis for other things, it was the logical choice for this scenario.

>>poshar+r9
Other than it's absence from this thread, Rabbit still seems to be going strong.

>>hyzyla+(OP)
Why use Redis when you have more suitable solutions like RabbitMQ or Kafka?

Obviously depends on the scale and needs of a project, Postgres etc is fine for simple queues.

I often see people waste unnecessary time by writing their own complex solutions, resulting in increasing technical debt, when you already have perfectly suitable open source options available that do a better job..

replies(1): >>brentj+Au

>>deckar+3d
How would double billing occur if the worker dies. The way I would design this, the billing request and bill would be committed atomically such that a request can only be completed with exactly one associated bill. If the worker dies, no bill is created.

Also I'd detect a worker has died by recording the start-time and using a timeout. Furthermore I'd requeue requests as distinct new entities. A requeued entity would have a self-referencing nullable FK to reference its parent request.

replies(2): >>jffry+ht >>deckar+DY

>>chrisd+zn
Sure, but:

1) 99.9% of internet-facing/adjacent businesses are not Google and will never reach even 1% of Google's scale

2) Proto + BigTable is very different from just throwing stuff in redis/mongo. Proto schemas are compile-time enforced, which is great for some teams and might slow others down. Google enforces more discipline than your average engineering team -- this is overkill for most engineering teams.

>>hyzyla+(OP)
For simple task queues, yes pg is ok. For high loads, redis is still better as PG generates a lot of WAL & connection overhead.

>>pjmlp+5e
That’s a bit of rewriting history. NoSql, particularly document style stores came about at least as much as a reaction to the overweight policies & orms the sql world has made rampant at the time.

You’d be trying to store a tiny bit of simple state and all the books/articles would have you standing up read only views & stored procedures for all your crud ops. The document stores came along with a fresh perspective and easy scaling.

Then their were the columnar stores and time-series stores that really did solve the newer scale problems in ways the existing sql stores didn’t.

I’m a sql guy through and through but it’s important to recognize the nosql movement was a reaction to real pain points. Also it made the sql databases better.

replies(1): >>pjmlp+Vx

>>hyzyla+(OP)
Watch out for transaction ID and sequence exhaustion if you have a lot of things rolling through a table as a queue.

Postgres is awesome but logging and queuing in a table can cause gotchas you won’t have with redis.

>>aidos+ri
The outbox pattern. Can in many cases be solved by using topics.

replies(1): >>aidos+7s

>>hyzyla+(OP)
Sometimes we need network based data structures that are other than relational tables. Redis delivers quite a few of those .. You can always implement then in a SQL database but Redis is just a better solution for those cases.

>>cerved+vn
I also have a lot of issues with people using tables and cronjobs instead of just using queues.

I once was the one implementing this stupid idea. It's very easy to shoot yourself in the foot. It seems to be ok at first, but it is a stupid idea and will give nightmares to the operations team.

It works as long as the table is small. It gets especially problematic when there are sudden spikes in the workload, so that the table is growing rapidly. People don't test these scenarios enough.

Often it's important to do the jobs in the order they came in. This means you need to have the index set up this way or you'll end up with full table scans. You also need to remove jobs which are done, but this will also mess with the tables performance.

There are tricks with databases to help scaling this, but at the end of the day it's much simpler to just use a queue.

>>hardwa+gd
KeyDB is really good. We use it in production to achieve millisecond response times on millions of requests per second.

replies(2): >>hardwa+kt >>killin+aJ

>>hyzyla+(OP)
Can anyone explain the need for redis in a clear way for someone who knows how databases work but isn't a backend developer? What alternatives are there? What did people do to solve the same problem before redis existed?

replies(2): >>snarky+cH >>mixmas+Su1

>>hughrr+gc
You’re dealing with survivorship bias - those companies whose products were successful enough that they needed your services. You don’t speak to the teams who went bankrupt because they spent too much time fiddling with RabbitMQ, when Postgres would have been fine.

As always, there are trade offs, no silver bullets.

replies(12): >>2muchc+Bs >>alunch+ju >>skytre+gw >>paxys+hz >>robwwi+kE >>dorfsm+IJ >>cortes+MU >>ikiris+Ac1 >>rantwa+Hc1 >>piokoc+Fd1 >>pyuser+ok1 >>rkk3+T12

>>flefto+Ml
The range where databases work fine is pretty small. Sudden spikes in job creation will kill the performance of the database. Often enough the job creations come from external systems you can't control.

>>sologo+qe
> You’d probably already have indices by user then.

Do you mean "you'd probably already have indices by user then, so you won't be able to take advantage of quick drops"?

> There also other ways to store the data, for example not as a time series of user actions, but as another data structure. It just depends on your requirements and data model.

I suppose I want to be a bit more specific. Till now, if I wanted to deal with this issue, I would just have used a cron job and some indexes and taken on the load. But what is an example of a nice way of dealing with this? You get to make up plausible requirements and data model details, perhaps drawing on particular requirements you had when you faced a similar issue.

> The gist of the original article is asking whether you could reduce tech stack complexity and use a single set of technologies for more use cases, allowing to understand the tool you are using better.

The legitimacy of a question in a free-form conversation doesn't depend on the original inspiration, several iterations ago. But even if it did, the question is here is exactly about understanding the tools better and whether we can use one tool instead of two, so by your summary it's perfectly on topic.

> Also, note that a traditional database may or may not be the right tool for the job - there are different storage/behavior needs for writing lots of data, processing lots of data and serving the results.

The subquestion here is precisely if we can get away with just using Postgres for data with mandatory lifespan requirements that vary over the lifetime of the data.

Extra tools come at a cost - that is the presumption of this article and the thread it has spawned.

If we have to use Postgres, we need to assess the cost of doing this in Postgres before we can decide whether or not to pay the cost of other tools. Waving in the general direction of other tools isn't helpful; it probably takes as much work to enumerate candidates as it does to calculate the cost of doing it in Postgres.

>>hyzyla+(OP)
it's cool that they added listen notify to postgres; I wonder when they will add many-to-many relationships; my problems with postgress start when i have to do a many to many relationship. For these you need to do an extra table for the link entries; now performance tanks, and you need to denormalise this for read requests, that's where the fun starts...

replies(1): >>mixmas+fu1

>>AtNigh+Dq
Ah ha. Yup, that’s exactly it. Thanks for the pointer.

Edit: can you explain by what you mean by solving it with topics? I thought topics were a broadcast mechanism?

replies(1): >>AtNigh+aY6

>>layout+hk
Same for any queuing system. You need to set the expiry time long enough for the expected task duration.

In SQS, for example, you use a visibility timeout set high enough that you will have time to finish the job and delete the message before SQS hand it off to another reader.

You won’t always finish in time though, so ideally jobs are idempotent.

>>soroko+Lo
The idea, I think, is you wouldn't delete the job from the queue until the processing was done.

Of course, this relies on the jobs being something that can be retried.

replies(2): >>shawnz+ZF >>soroko+kN

>>blowsk+gr
In my experience proper abstraction is the key. If you have a clear abstraction and structure you can replace the implementation when you need to without too much disruption. If it’s too leaky you’re screwed.

replies(3): >>goodol+3t >>hughrr+lA >>aparso+qA

>>miohta+S8
> Not all use cases require reliable data storage and it is ok lose few seconds of data. Think simple discussion forums, internal chat applications.

That is definitely not ok. I'd be really pissed as a user if I wrote a huge comment and it suddenly disappeared.

replies(1): >>miohta+tx

>>zigzag+Hn
I think your "most often" is more like 0.01%, I'd say the inverse is true, that _most_ would be fine with single sqlite host or something like rqlite.

replies(2): >>otoole+cw >>zigzag+Ax

>>2muchc+Bs
100%! Create a good enough solution and put it behind an abstraction that remains stable when you later create an ideal solution

>>cerved+pp
For many billing activities, "days" is a small amount of time delay. One very viable option to "the worker died" is to just look for stuck jobs once the run is complete and kick out an alert to let a human decide what to do.

>>victor+3r
It really looks absolutely amazing, I feel guilty because I want to run a service on it, there's almost no downside to running it everywhere you'd normally run Redis.

Also in the cool-redis-stuff category:

https://github.com/twitter/pelikan

Doesn't have the feature set that KeyDB has but both of these pieces of software feel like they could the basis of a cloud redis product that would be really efficient and fast. I've got some plans to do just that.

>>RedShi+ph
You can extend redis with lua to perform atomic operations, too. And it has bindings in a lot of languages by virtue of its protocol being so simple.

>>hughrr+Zk
> you have to build your own monitoring stack,

How is this done in Redis automatically? At some point you are writing queries (albeit Redis ones) to pull metrics. Other queueing systems may expose an API but there is always some level of integration. With Postgres/other db I would write SQL which is their API and a powerful one at that.

I can couple events with triggers to auto-generate other events etc, have built in audit capability, roll up reporting etc, all with standard SQL. (all in the context of monitoring and reporting)

replies(1): >>hughrr+yB

>>kasey_+8o
I agree.

Going from a single database to a (database + queue) means two server processes to manage, maintain, observe, test etc.

I actually start with SQLite to reduce as much distributed state as possible, then move to something else once it’s proven it will not work.

>>blowsk+gr
You're not wrong, but if anyone on the team has already worked with a queue system that they need to manage then utilize something that scales. Otherwise there's no reason to use the cloud solution you're hosting with which hand hold you to make it work quickly with your integration.

>>dutchb+kp
I have had two separate benefits from using just Postgres in apps:

1. It’s one tool to learn. No need to upskill on topics, partitions, consumer groups, routers, brokers, etc. 2. It’s one transaction boundary. You get a lot of “magic” for free when you know that a rollback will not only revert application state but also NOTIFY events published and queue jobs in flight. This alone makes Postgres a powerful option.

>>hyzyla+(OP)
I’d think you don’t really need redis or elastic search in about 80% of the places you see it. You certainly don’t need the baggage.

>>_ugfj+z2
I believe something like this would have issues with multiple consumers.

>>superm+zb
For pub/sub yes that's correct. For full info though: Redis later added streams (in 5.x) for the don't-wan't-to-miss case: https://redis.io/topics/streams-intro

>>mirekr+Rs
rqlite author here. Happy to answer any questions about it.

https://github.com/rqlite/rqlite

replies(1): >>mirekr+0B

>>blowsk+gr
> You’re dealing with survivorship bias - those companies whose products were successful enough that they needed your services

In fairness, Redis has only been available for 12 years so someone who has been in the industry longer has probably encountered systems using DBs as queues for no better reason than an alternative was not available when the system was made. (Rabbit just a couple years older I think.)

But in this day and age, you have better options to start with c'mon.

> they spent too much time fiddling with RabbitMQ, when Postgres would have been fine.

Rather ironic claim in a thread full of tips, tricks, and gotchas just to make PG behave like a queue no?

I can't speak for RabbitMQ personally but Redis has got to be among the simplest drop-in dependencies I've ever encountered. For a simple queue, the defaults do out-of-the-box.

Gut feel also says, Redis defaults for queuing would also serve more than PG defaults for queuing so you could leave it untouched for far longer to focus on your product, get to profitability, etc. Feel free to debunk with data.

replies(5): >>hughrr+GA >>stickf+OK >>boardw+UL >>sh87+eM >>derefr+9R

>>hyzyla+(OP)
It's always wild when an article on HN has an answer to the exact problem i wanted to solve this weekend. Kudos!

>>EvilEy+Cs
It only disappears if there is a catastrophic failure. The likelihood for such thing to happen when you write a huge comment are less than jackpot in Las Vegas, a sensible risk tradeoff for better development experience and cost.

replies(2): >>TheCoe+RP >>obstac+vj1

>>mirekr+Rs
What would you then consider to be the most common bottleneck?

I agree that there are many cases with low workload where that would be plenty.

replies(2): >>mirekr+8A >>solips+zU

>>hyzyla+(OP)
While you are at it, don't forget to use UNLOGGED tables. UNLOGGED == In Memory.

But if you must use disk based table for Job queueing, set fillfactor = 50. This takes care of heavy updates.

Indexes are helpful but costs memory and CPU, so always make sure you partition the table based on job_type for performant pending job query.

I wouldn't recommend using LISTEN/NOTIFY unless you are okay with "at most once" semantics. I have used disk table based approach for PUB/SUB to replace Kafka. More fine tuned approach will also allow (job_type, consumer_group, publisher) as a partition key.

Ref - https://www.postgresql.org/docs/current/sql-createtable.html

replies(4): >>kureik+0E >>osigur+pH >>Benoit+9Y >>tpetry+tu1

>>kasey_+hq
What history rewriting?

I have been using RDMS all along since 1996.

Nokia NetAct was scaling GB of data across multiple clusters with OLAP reporting engine in 2005 with no hiccups.

The experience I had with DynamoDB kind of proved to me I haven't lost anything by staying in the SQL path.

Most NoSQL deployments I have seen, could have been easily done in Oracle or SQL Server, provided they actually had a DBA on the team.

replies(2): >>kasey_+7I >>nemoth+i01

>>himinl+Gf
Why does by default matters so much to you? Redis has persistence support and it can be easily turned on.

replies(1): >>himinl+XC

>>yukino+K4
The fewer moving parts bit is key.

It was a government project, written by one team (us) to be maintained by another.

The data that needed to be expunged was user signup data, upon completion the record was sent to a CRM and the Redis record destroyed. If the signup wasn't finished it's automatically removed after 12 hours.

Referential integrity wasn't really a problem, emails are unique and if we clash the two records are auto-merged by the CRM.

Setting up scheduled tasks, triggers, partitioning, cron, etc, is just more things that can go wrong. If they go wrong _and_ go unnoticed we end up with piles of data we shouldn't have. That would be many different kinds of bad.

Doing `redis.set(k, v, ex: 12.hours)` or whatever is just easier.

>>hyzyla+(OP)
SKIP LOCKED looks interesting; I'll have to try that. I've used advisory locks in the past, but I kept running into deadlocks when I tried to acquire them intelligently (just for popping off the queue). It was unclear why, at the time, so I just put an advisory lock on every transaction. Obviously that causes serious contention problems as the number of jobs and workers increase.

>>sgt+Va
Don't use Kafka as a queue.

(I haven't used Kafka for a while, if anything below is outdated let me know)

The main issue I've experienced was balancing work across consumers - if one consumer is slower than others (e.g. running on a machine with some other heavy workload). In such case when a larger number of jobs is enqueued and expected to process quickly it's possible that everything will complete except jobs which landed on the partition belonging to the slow consumer. There can be only one consumer per partition so others can't help.

One could consider using a single partition to avoid this imbalance but this means that there could be only a single consumer - not good.

In other queue systems consumers fetch up to N jobs from a shared pool when they need more work. This means that in the case of a slow consumer the lag is limited to by the time required to process N jobs.

This situation also arises if consumers are equally fast but some kinds of jobs are considerably slower. Random distribution across partitions mitigates it to a degree.

My case was precomputing caches after some event, the goal was to do this within 10 seconds. First implementation with Kafka worked but very often some consumer couldn't use as much CPU time as others and we didn't want dedicated nodes for this because this even was happening at most few times a day. As a result jobs from a single partition were processed 2x slower - we had to overprovision.

Replacing Kafka with a queue based on Redis solved this right away, all consumers were finishing at the same moment. Although it wasn't a large scale operation.

The second complaint about using Kafka as a job queue is not being able to modify the queue, e.g. to reorder jobs, postpone jobs or cancel jobs

replies(1): >>sgt+Wb1

>>blowsk+gr
You are wayyy overestimating how complex something like RabbitMQ or Redis is. You don’t need to hire entire teams of engineers or go bankrupt setting up an instance. It is less work than implementing a production-level queue in Postgres for sure.

replies(4): >>zomgwa+aB >>aether+mB >>fauige+EB >>mbrees+QB

>>hyzyla+(OP)
Okay, but why? Redis is perfect for job queues and pub/sub. Use the right tool for the job.

>>zigzag+Ax
Most common bottleneck is lack of competence.

Direct visible effects are wrong decisions entangled in spaghetti-like complexity.

It's hard to reach technical bottleneck in well designed systems. Computers are really fast novadays. They will vary greatly depending on what kind of system it is. Out of resources – cpu, memory, network, disk io – likely the weakest of them will be saturated first – network. But that's not a rule, it's easy to have system which will saturate ie. CPU before.

replies(2): >>hughrr+1B >>zigzag+mG

>>2muchc+Bs
Yes. Use a queue abstraction for which there are numerous queues available off the shelf for!

>>2muchc+Bs
In my experience, great abstractions without a good data migration blueprint is as about as useful as no abstraction at all.

In this example - great, you have a “queue” abstraction. When you switch to RabbitMQ or Postgres, how do you move your existing data without a quality of service or data loss? It can be difficult with production datastores even if the abstraction within your service is beautiful.

replies(1): >>lolind+OF

>>skytre+gw
I've got about 6 years operational experience with Redis. I would not use it for a queue. I struggle to consider using it as a data store these days as well with other options on the table. But that's a whole new thread so I'm not going into details here.

If you want a queue, get a queue.

If you want a relational database, get a relational database.

If you want a key-value store, get a key-value store.

Don't cross the streams.

replies(2): >>mikekl+8j1 >>Too+eb2

>>otoole+cw
Are you planning on adding websockets or something similar in the near future to support things like ie. data change notifications [0]?

[0] https://www.sqlite.org/c3ref/update_hook.html

>>mirekr+8A
This is the best post in this thread.

A lot of people don't see the effects of their decisions. They leave a company after 3-5 years and go and work somewhere else where they get to make the same mistake again. The bottleneck indeed is lack of competence.

As for technical bottlenecks, it's quite easy to hit a wall. Be it through layers of stupid or unexpected success. We have unexpectedly reached the limit of what is possible with x86-64 on a couple of occasions due to stupid decisions made over 10 years previously for which there is now no longer the budget or attention to fix.

>>hughrr+gc
the other side of the coin is that queue servers are databases, just optimized for the use case.

>>paxys+hz
I agree. I’ll also add that Redis is a level easier to operate than RabbitMQ.

>>hughrr+gc
You're not wrong, but every solution involves compromises. More broadly, I would argue, though, that almost nobody actually needs queues.

Sure, there are cases were you actually need strict ordering, but they are relatively few, in my opinion, involving things like transaction processing or other areas where the order of operations change the effect of the operations and must be strictly sequential.

Ordering itself brings significant technical challenges. For example: With a queue, you can only have strict ordering as long as you only process one item as a time, with no concurrency. Ordering also complicates handling of failures and retrying, because it means every newer item has to wait until the failed item has been dealt with; if you use things like dead letter queues, order is violated.

In almost all use cases -- anything from image processing to newsletter delivery -- a better system is to have an unordered list of items that anyone can take items out of, in any order. Then you throw as many workers at this system and design it so that there's minimal locking needed to prevent workers from stepping on each other's.

There are queue systems that have a purpose in a stack, specifically pub/sub brokers, which can be used to schedule work along several dimensions. You use the pub/sub system to signal work to be done, but each payload just refers to some state that's stored elsewhere.

replies(2): >>hughrr+NC >>gravyp+TC

>>paxys+hz
Have worked at multiple companies that successfully used redis or rabbit with teams of less than five engineers.

It's a little insane that a highly rated thread on HN is telling people to use postgres as their queuing solution. The world is wide, I'm sure that somewhere out there there is a situation where using postgres of a queue makes sense, but in 99% of all cases, this is a terrible idea.

Also, SQS and nsq are simple.

replies(2): >>politi+jY >>Gordon+211

>>deskam+0u
Um, I'm not suggesting using Redis. In actual fact I said elsewhere I wouldn't use Redis for queues.

As for triggers, reporting, audit, I'm laughing now because you miss the point. That's another bunch of IO in your monolithic black box, and you're still building it.

Start here: https://www.rabbitmq.com/prometheus.html

replies(1): >>Jarwai+Tj1

>>paxys+hz
The issue is not necessarily the complexity of RabbitMQ or Redis. The complexity comes from having to manage another (stateful) process that has to be available at all times.

replies(1): >>hughrr+UC

>>cerved+vn
It stopped scaling when we put in the PO for a $150k x86-64 box to run it on and finance said WTF.

replies(1): >>cerved+vX

>>paxys+hz
“Production-level” means different things to different groups. Many (most?) groups don’t operate at the scale where they need a specialized queue in a dedicated message broker. Simple queues in a DB will work fine. Even if it isn’t very complex, why not already use the infrastructure you probably already have setup — an RDBMS?

Now, if you’re using a nosql approach for data storage, then you already know your answer.

replies(1): >>gravyp+oD

>>hyzyla+(OP)
interesting... im in opposite camp. do you really need anything other than key/value store and a data structure? ive used redis exclusively for close to a decade.

>>_ugfj+z2
> You need a table with a primary id and a "status" field.

For a queue, you need a table with an insertion order and a status indicator. “Primary id” is not enough because “primary id” may not be ordered (e.g., a UUID primary key is a primary id.)

OTOH, while a DB can do this, therr are plenty of well-tested open source purpose built message queueing solutions, so the swt of scenarios where “roll your own with a DB” isn’t a misuse of effort is limited.

replies(1): >>lolind+pK

>>hughrr+gc
Exactly, GP does not sound like someone who has worked on anything with decent scale. Because if you did, you would know that it is a really, really bad idea.

replies(2): >>lolind+NG >>maniga+zM

>>atombe+fB
I think you're missing something important here by applying the worker pool simplification to the idea. Order is important i that case as well. It forces both fair scheduling of the work and the metrics from the queue can easily describe the overall latency of the work being done and the capacity and utilisation. The latter can be used to scale the consumers of the queue and/or alert on when things aren't working properly.

If you throw a proper messaging broker on the table like RabbitMQ then you're getting message acknowledgement as well which is important if you must do the work. A failed consumer will result in the work being rescheduled on another consumer transparently. Of course there are metrics you can see and get heads up when your consumers are not working.

Ergo, don't simplify something too soon. Even reliable delivery of newsletters is rather important otherwise why would all the tracking pixels exist? (they are a form of message acknowledgement)

replies(1): >>atombe+2j1

>>hughrr+ll
> The point of queues is sequencing, storage and decoupling of events for the sake of scalability and durability.

The point of queues in any particular application is some subset of that; the calculus of implementation approaches that make sense depends not on the abstract point of queues in general but on the concrete point of queues in your use case.

>>atombe+fB
I use Queues for retries, buffering, and parallelization. It's pretty easy to write most programs to support at least once and to run multiple queue workers that you auto scale.

>>fauige+EB
Yes. Pay Amazon to do it and add use all that saved time to add business value instead.

They'll also manage the consumers of the queue and scale them too! Serverless is bliss.

>>YetAno+sy
vi has persistence support

>>kasey_+8o
It wasn't a problem for Sybase on a commodity host circa 2000 because clearly that host wasn't doing a whole lot of other stuff. It's a big problem for our 48 core nodes with 2TiB of RAM and a metric shit ton of DAS NVMe. Ergo anecdotes don't scale either.

To clarify we just moved this entire problem to SQS.

>>_ugfj+z2
This sounds dangerous. Without some good way to prevent workers stepping on each other, that will result in lots of useless queries and failed updates if a lot of jobs get scheduled at the same time.

In the Drupal case, the database returns a single row which is pretty much guaranteed to be the same for all workers in between updates. You really don't want that in the same database your page views hit. At least selecting N rows at a time and then claiming a random item from them would be slightly better.

replies(1): >>virapt+xv1

>>mbrees+QB
My main concern would be monitoring. Most queue systems connect to alerting systems and can page you if you suddenly stop processing thongs or retrying the same query many many times. For a DB, since the scope of access is much larger, you don't get these sort of guarantees for access patterns and you essentially need to reinvent the wheel for monitoring.

All to save 2 to 3 hr of Googling for the best queue for your use case and finding s library for your language.

It makes sense if you don't care about reliability and just need something easy for many many people to deploy (ex: Drupal).

replies(1): >>bcrosb+GJ

>>truth_+Rx
Thanks so much for this. I have been looking for this in Postgres. MySQL has memory back table which helped us a lot in the past.

>>petepe+V
exact this. even MongoDB has TTL index which is amazing nice.

>>hyzyla+(OP)
Redis to me is the magic solution to solve so many complex multiple process syncing issues, a global lock, rate limiter, uniqueness with a set, stasher of all the temporary things… no migration, quick solution to complex problems- streams of events to organize pending tasks, etc… to me it’s all about the temporal / temporary state my application needs to keep track in order to work in a multiuser - multiprocess and multi service environment… just a different tool for a different job then what I would use a database for… but really it’s computer science, to each is their own

replies(1): >>19h+CP2

>>blowsk+gr
Great comment: Been reading Taleb’s Black Swan this morning on ascertainment bias, so this resonated well.

>>hyzyla+(OP)
This seems to come up on HN at least once a year. Sure it can work but LISTEN ties up a connection which limits scalability as connections are limited and expensive. Also, mitigation strategies like PgBouncer cannot be used with this approach (nor can scale out solutions like CitusDB I don't think).

Of course, if scalability is not a concern (or the connection limitations are eventually fixed in postgres), this would be a very viable approach.

>>hyzyla+(OP)
The best reason I know of to use a relational database as a queue is that it lets you trigger queue operations from within a transaction - so a queue item is guaranteed to be created if the transaction succeeds, and guaranteed not to be created if the transaction fails.

Brandur wrote a great article about that here: https://brandur.org/postgres-queues

>>skytre+vj
What has me wanting to stick with postgres is that I work on a small team (two developers) and adding more technologies to our stack is extra overhead that's hard to justify. At our peak we're currently handling one request per second, and postgres for a queue is more than sufficient for that. Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows? Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?

replies(1): >>skytre+KL

>>aparso+qA
Isn't migrating a queue super simple? Have a window in which you only add to the new system and you listen to both. When the old queue is empty, delete it.

If you need to keep track of old data, then yes, migration is hard. But queues don't save old data—that's the point of a queue.

>>jffry+ps
But you will have marked it as in progress by setting the "expire" to a non-zero value, preventing any other workers from trying to work on it. How will they know that the worker which marked it actually crashed and will never finish?

By using SELECT ... FOR UPDATE SKIP LOCKED, the record will automatically get unlocked if the worker crashes.

replies(2): >>jffry+aH >>soroko+pN

>>chx+hd
I have been asked this question in interviews for how to prevent two people from booking the same seat. Interviewers don’t seem to be satisfied by the answer that I will let the query go to the database and see who ends up booking. They want some advanced locking mechanism using redis. Redis does serialize queries using lua thus avoiding two people from booking the same seat.

replies(4): >>bryanr+6K >>agf+7K >>kccqzy+De1 >>phamil+3V2

>>hyzyla+(OP)
We tried it with just PostgreSQL and struggled with missed jobs. We tried it with Redis + PostgreSQL and haven't looked back. I'll take the blame for not engineering the first version adequately (this was before upsert) but Redis has been useful in so other ways that I'm glad it ended up in our architecture.

>>mirekr+8A
Competence is expensive :) While I mostly agree, even well designed systems have (sometimes considerable) tradeoffs.

> It's hard to reach technical bottleneck in well designed systems. Computers are really fast novadays.

I have been listening to how fast moderen computers are for the better part of the past two decades, yet as I user I still have to deal daily with too many of slow software and slow web services.

replies(1): >>mirekr+yS

>>vincen+sC
Or maybe GP is someone who has worked on projects from their infancy. If your queue will be handling less than a request per second for the foreseeable future, why not just use the technologies already in your stack and plan a migration path now? What do I gain from adding Redis (and thereby making my system even more distributed, with all the hazards that carries) if my project isn't even guaranteed to make it to 10 rps before going bankrupt?

>>shawnz+ZF
Not sure. Maybe something else is responsible for identifying "stuck" jobs and kicking them out to humans to decide what to do (assuming worker crashes are rare)

If we're talking PostgreSQL specifically, and newer-ish Postgres (9.5+ I think), then you can leverage its abilities to do all this in one atomic query:

  UPDATE jobs
  SET status='working'
  WHERE id = (SELECT id
              FROM jobs
              WHERE status=NULL
              LIMIT 1
              FOR UPDATE
              SKIP LOCKED)
  RETURNING id

replies(1): >>shawnz+mH

>>before+9r
ELI5:

Redis is really a mix-bag of many useful commands, see [https://redis.io/commands]. It has been referred to as the "swiss army knife" because you can build many custom solutions from these building blocks. Still, its most common use-case is for in-memory key/value caching for performance reasons. For instance, if you have one database query that takes a particularly long time, you can execute it once and store the result in redis, and then retrieve that value extremely fast.

>>jffry+aH
If you are willing to use SELECT ... FOR UPDATE SKIP LOCKED, then you can implement the queue without any UPDATE query or status column at all. Just lock the record to mark it as in progress, that is how you get the benefit that when the worker crashes, it will automatically be returned to the queue.

>>truth_+Rx
My understanding is, UNLOGGED means that changes are not written to the WAL and data can be lost in the event of an unscheduled shutdown. It doesn't mean that the table only exists in memory however - the data is still eventually persisted to disk.

replies(1): >>truth_+qJ

>>hyzyla+(OP)
The world would be a simpler place if people actually knew their storage engine like Postgres. I mean, knew it. All the features and how to leverage them in a performant way. Would eat less cloud energy too.

>>rualca+f8
There's Redis-compatible alternatives that use disk space instead of RAM.

https://wakatime.com/blog/45-using-a-diskbased-redis-clone-t...

>>pjmlp+Vx
But that’s sort of the point. You are saying “if you have a DBA and use proprietary products and have the skill to understand the trade offs of running a db with another data warehouse product layered in you can handle GB of data”.

Mongo said “Throw data at me and I’ll scale with very little work”.

Now, I’ve always largely believed that’s penny wise and pound foolish but it’s certainly a good pitch.

replies(2): >>soroko+FS >>pjmlp+7Z

>>_ugfj+z2
So you need to keep polling the db?

replies(1): >>lolind+9J

>>joelbl+iI
Postgres has LISTEN/NOTIFY for this.

>>victor+3r
Are you Google search? How do you have millions of requests per second?

replies(2): >>maniga+QM >>qetern+bW

>>chmod7+K
well... how much uptime do you need and how much resources do you have to devote towards achieving your desired uptime

>>hardwa+Rm
operations aside, the big problem in my experience dealing with these systems is you are extremely limited (on purpose) and cant do much sorting/filtering/aggregation/querying. that's what really makes true db's powerful. I love redis for what it does, i just dont think it replaces a DB well in many cases where its non-transient data

>>osigur+pH
From the official doc link i shared already:

UNLOGGED

If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.

replies(1): >>andrew+2L

>>mossel+14
For Elixir, Oban is a great library that allows the same, in case anyone was looking for one.

https://github.com/sorentwo/oban

>>gravyp+oD
We use pingdom that hits a page that gives it the health of various systems. Queue included.

> All to save 2 to 3 hr of Googling for the best queue for your use case and finding s library for your language.

The cost of using a new piece of tech in production is not just 2 or 3 hours.

replies(1): >>gravyp+ML

>>blowsk+gr
There is a lot of space between Postres and RabbitMQ. If performance is a factor it takes alot of work to make a Postgres table fast enough. These days, and for the past at least 5 years, you're better off using a service like SQS, or setup Redis yourself.

>>abhish+2G
based on experience arriving very last minute to flight, nobody has problem with letting two people book the same seat.

replies(3): >>kyleee+DP >>chx+ne1 >>abhish+KG2

>>abhish+2G
I think this being a good answer hinges on the resulting user experience. If one person spends several minutes going through the process of putting in their payment and other info only to be told you don't have a ticket, they're going to be pissed. So you need a good answer on how to avoid that experience, however you implement the locking.

Saying this as someone who has been on both sides of that interview question, and also evaluated performance in it as a hiring manager.

replies(3): >>lstamo+zR >>spockz+D51 >>abhish+qG2

>>dragon+gC
Why on earth would you make a queue table with a non-serial primary key? What benefit do you gain from a UUID here?

>>skytre+gw
There's one incredibly valuable feature you get with PG that you will never get with Redis or RabbitMQ or any other task queue: Enqueueing tasks transactionally with your unit of work.

If your load requirements fit using your database as a queue, it can radically simplify your system.

I've personally done a fair bit of migration from formal queue systems into an RDBMS. You simply get a lot more control and visibility that way. But I use task queues heavily too. It's all very application-specific and I don't think you can generalize.

replies(2): >>politi+lX >>Gordon+d01

>>truth_+qJ
Uh, yeah, that confirms what osigurdson said, not that they're in-memory. For that IIRC you need to mount a ram-disk in your OS and put the table on that. Definitely also make it UNLOGGED, though.

>>hyzyla+(OP)
Surely redis is trivial to set up. Postgres gives you lots more maintenance and admin headaches.

>>lolind+gF
> Is there any good reason for us to add, learn, and maintain a technology neither of us yet knows?

Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

Plus, as me and others keep saying, Redis (and other proper queues) isn't a complex addition to your infra. This isn't Hadoop, or Kafka, which is a very special type of queue (one way to put it, at least).

> one request per second, and postgres for a queue is more than sufficient for that

Yes I agree but...

> Or would we do just as well to abstract away the queue in the code so that we can switch to redis when we do run into scaling problems?

What I read when I see such statements is this mythical software engineering ideal that with enough abstraction, a migration is just a matter of writing a new class that implements some interface and then changing a config. For a sufficiently complex app infra, that happens almost never because you could never keep the abstraction leaks to an acceptable level.

Another thing, abstraction does not solve all your problems if the underlying implementation is poor fit to begin with. Let me paint you a plausible scenario:

Once you are large enough, you might find your PGQ acting weird and you realize it's because someone in the team wrote code that accesses your queue table like it's an actual table of records, not a queue. So you think, okay let's prevent that from happening. Maybe you add users and permissions to distinguish connections that need to access between tables proper and queue. Maybe you start writing stored procs to check and enforce queue invariants periodically.

Well, guess what, all those problems would've been solved for free if you invested maybe one work day getting a Redis server running when you were a two-person op serving one request per second.

Lastly, scaling a relational DB is an entirely different beast from scaling a queue. Scaling anything does not ever come painless but you can reduce the suffering when it comes. Would you rather scale PG so it can keep acting as a queue or scale a queue that's, you know, really a queue in the first place? Heck the latter might even be solvable by throwing money at the problem (i.e., give it more compute).

[1] Except for the part where you need to make money, of course.

replies(1): >>yongji+Hf1

>>bcrosb+GJ
If you're at the scale where your postgres db can be used as a queue and no one on your team has experience running a these systems (most startups) then pretty much anything will work to begin with and as long as you have a clear interface that separates your code from your deps for the queue it'll be easy to swap out.

At $JOB-1 I wrote a `Queue` and `QueueWorker` abstraction that used an environment variable to switch between different queue backends while providing the same interface. Because of this I got everyone up and running with Redis lists as a queue backend and then could add in things like MQTT or RabbitMQ as things evolved. I also defined very loose constraints for the queue interface that made it so the implementer. Essentially there was a `push()` which added something into the queue or failed and returned an error. Then there was an `onWork()` which was called whenever there was work "at least once" meaning your system had to handle multiple instances being delivered the same work item. We would only ack the queue message after `onWork()` completed successfully.

There's not really anything preventing a team from doing this, putting a pin into it, and coming back to it when there's a scalability or reliability concern.

>>skytre+gw
You (or whoever is doing ops) will have to pay with your time to operate Redis until the product dies though.

I don't think adding another component (especially one with storage/backup requirements/more complex failover procedures) should be taken lightly.

Paying attention to the tips/tricks/gotchas is something you'd pay once and hopefully document in source code/documentation.

If you piggyback off someone else's work (another team at your job, a cloud-based solution that handles everything, etc) the calculus would change, of course.

replies(1): >>skytre+a41

>>skytre+gw
Using a tool without knowing reasonable bounds of the domain, current requirements, and how the tool (redis in this case) solves the problem isn't good advice.

Case in point our team went with Redis, just the default, use it blindly without fully understanding our requirements and how redis helps scale.

2 years later we spent 2 sprints, holding back the release trying to understand RDF vs AOF, and why we're seeing massive spikes in consumption and performance triggering pod eviction, running comparison tests to prove which works better and explaining why, running qa tests (regression, performance, load), introducing postgres for queuing, redoing our code to bypass the sync mechanism between how data flows between redis and postgres, updating dependenciies, migrating existing customer data (various on site locations), explaining this to team members, managers and their managers, installation technicians, support engineers and presenting it at engineering townhalls as a case of bad decisions.

Not worth it.

replies(2): >>satees+zO >>skytre+i21

>>hyzyla+(OP)
Get your requirements in check.

How sensitive is your app to latency? How much data and request volume you need to handle?

Do proof of concepts, write thorough load tests and go for what makes sense.

Either way, no matter which tech you choose, make sure monitoring and alarms are in place and that you do regular maintenance exercises.

>>vincen+sC
Very few things have "decent scale". 99% of apps can run on a single server with modern hardware.

>>killin+aJ
Lots of industries and applications can get to that scale. My last few companies were in adtech where that is common.

replies(1): >>killin+OF1

>>Aeolun+z3
Most workloads are small workloads. Modern hardware is incredibly powerful.

>>jffry+ps
Looking at the code, some steps are missing in the description:

3. When processing is complete the matching table entry is deleted.

4. There is a "reaper" process that checks for stale jobs ( based on time in the future ) and resets them to 0.

This of ouffcourse raises more questions.

>>shawnz+ZF
This approach has its own gottchas.

>>hyzyla+(OP)
I feel I could write a similar article titled:

  Do you really need PostgreSQL? How to Get Away with Just SQLite

How many apps are running pg on a single node and don't really have plans to scale beyond this?

Exclusive ownership of a SQLite database by the application can deliver better performance than exclusive ownership over a single node postgres database. This also extends to SQL Server, DB2, Oracle, or any other hosted solution.

replies(1): >>aeoleo+JV

>>edwiny+Nj
The status doesn't help if the worker died during processing. If you've got something that's been "in progress" for 20 minutes of a job that takes 2, your job might not be getting done. That's what expired is for.

replies(1): >>edwiny+J75

>>sh87+eM
I believe you meant "RDB vs AOF".

Also reading your reply I get the impression that "sync mechanism between redis and postgress" was the bottleneck. Wondering if you can add some details around it and also was this something that can't be fixed by fine tuning redis config, rather than completely removing it from your stack.

replies(1): >>sh87+9P

>>satees+zO
Yes I did mean RDB. Thanks for pointing out.

There were many problems but at the core of it, this was us having redis write huge amounts of data to disk very frequently causing this.

We could not reduce the frequency (product would not allow) and we couldn't find a way to make the writes reliably fast.

I like to believe there exists a possible way of handling this, but point being, our team had no time to find out, how redis works internally and have confidence that the new way won't bring up new surprises.

>>bryanr+6K
Let the check-in attendant handle it; that's what I call a no-code solution! You're hired

replies(3): >>Tostin+u41 >>lootsa+hG1 >>abhish+ZG2

>>miohta+tx
How is that a sensible tradeoff compared to just using something that was actually designed to be a database when you need a database?

replies(1): >>miohta+fc2

>>_ugfj+z2
A single Rabbit node seems easier than rolling this yourself and doing all the work to test it.

replies(3): >>solips+0S >>z77dj3+X31 >>jasonw+qk1

>>soroko+Lo
I think they want to select expire < now instead of expire = 0. Delete the row on completion.

>>skytre+gw
> Rather ironic claim in a thread full of tips, tricks, and gotchas just to make PG behave like a queue no?

There are libraries in many languages written specifically to accomplish the "make PG behave like a queue" part. All these people are explaining the principle of how it would work, but in practice, you can just pull in one of those libraries and be done. (Or, more accurately, you use an abstract external-job-queuing library and tell it to use its Postgres adapter, which in turn uses that library.)

In exchange, you get the ability to not have to set up / maintain any more infra than you already have. Which can be important if you're already "at scale", where every component you set up needs to be duplicated per geographic-region, clustered for High Availability, etc. If you've already done all that for Postgres, it'd be nice to not also have to do all that again, differently for Redis, RabbitMQ/Kafka/NATS, etc.

> Gut feel also says, Redis defaults for queuing would also serve more than PG defaults for queuing

Don't know about what you've got your queues doing, but our service only does (persistent, out-of-process) queuing for two purposes:

1. Global scheduled background-job queuing. (Think "cron jobs.") There are only a static number of these; they don't scale up with load.

2. Lifecycle email dispatch. This scales up with MAU, but with very very low coefficients.

For those cases, Postgres would totally be enough. (Anything would be enough!)

Most "queues" needed in the real world are requirements-light like this. It's a rare company whose product is doing something Apache BEAM-alike with its data, where a high-throughput reliable MQ is a fundamental part of the data workflow. (And those companies, I hope, know that they need an actual MQ, not a piece of software that does queuing in its spare time.)

Our company doesn't use Postgres queuing; we do, in fact, use Redis for it instead. But we only ended up doing that, because we already needed Redis for other stuff; and if you already have Redis in play (including an ops plan for scaling it), then it's the better tool for the job.

replies(1): >>skytre+v11

>>agf+7K
Send bookings to the database early, with an expiry time and a counter visible on the screen?

Thing is, because purchasing is often shared across third-parties, even then you can’t guarantee that you’ll have the reservation until it’s paid and entered into the shared booking backend… unless the third party backend has a locking mechanism, at which point you’re probably not using your own Postgres to do this, but their mainframe instead.

>>jayd16+oQ
But if we always do what's easiest, then we will make a bunch of horrible choices.

Other things that matter, often more than what's easiest:

* Flexibility. If you aren't 100% sure what you'll need, but need something in place today, it can make sense to choose an unopinionated, generic solution.

* Familiarity. Maybe you've never used Rabbit.

* Ops simplicity. Maybe you already have a bunch of postgres deployed. Adding a new type of system means you need new ways of monitoring, new dependencies, different deployment.

replies(1): >>throwa+xT

>>hughrr+xf
What database can't handle an extra 25,000 entries?

That's... nothing. Databases handle billions of rows effortlessly using a b-tree index. So not really sure what point you're trying to make?

>>zigzag+mG
Somebody once said "cheap things are expensive". This idea applies to developers as well. Cheap developers will drive company through bumpy roads towards uninteresting plains. Good developers not only pay for themselves but bring orders of magnitude more cash in. Only thing that touches on this that I can find is "software craftsmanship manifesto".

replies(1): >>zigzag+u51

>>kasey_+7I
"mongo said" - the obligatory mongo DB is webscale video:

https://m.youtube.com/watch?v=b2F-DItXtZs

>>_ugfj+z2
What am I missing? With your description, nothing ever sets expire to 0. Can't look at the code comfortably as I'm on mobile.

>>chx+Sb
Could the process not delete the row, and not commit until it has finished processing? Otherwise, the transaction is rolled back when the db connection ends. This would not require timeout or clean-up.

One would need the appropriate transaction isolation, but shouldn't it work?

replies(1): >>Seattl+zx1

>>solips+0S
Of these, I think the last is the most compelling. Why introduce something new if you already have all of the infra and tooling configured for managing postgres? The parent mentioned testing the postgres implementation, but that seems like quite a lot less work than configuring the infra-as-code, monitoring, alerting, etc required for a new type of thing.

>>arpa+a9
For that temporary use case, how does it compare to memcached?

replies(1): >>jasonw+Ll1

>>nickjj+Tg
Totally agree. The number of job queues which use or can use Redis as the backing store is legion. Celery, rq, arq, Golang has asynq (inspired by sidekiq iirc), and that's off the top of my head. IMHO, it's just a better interface for implementing a job queue than an RDB.

It's also probably one of the easiest services to deploy and manage; often a one-liner.

Plus like you said, swiss army knife. It has so many uses. It's inevitable my stack will include a redis at some point, and my reaction is almost always "I should have just started with redis in the first place."

Is redis prone to golden hammer syndrome? Of course. But as long as you aren't too ridiculous, I've found you can stretch it pretty far.

replies(1): >>advent+ev1

>>hyzyla+(OP)
Because one can, does not mean one should.

>>nickjj+Tg
The elixir world has Oban[0] which implements quite a lot of advanced job features on top of PG. Admittedly it doesn’t quite have the usage of Celery and Sidekiq but most queueing libraries don’t.

[0] https://github.com/sorentwo/oban

replies(1): >>bright+0Q1

>>hughrr+gc
Can you explain exactly what doesn't scale well?

Databases scale well in general, it's one of the things they're built for.

A single database server with an indexed table (b-tree) and an SSD is extremely performant. (And yes there are other indexes that could be even more performant, but b-trees are already extremely fast.)

But the huge advantage of putting a queue in your database is being able to create transactions across both the queue and data in other tables that needs to change at the same time.

Not to mention one less extra tool to maintain, existing database tools for backups replication etc. automatically include the queue as well, etc.

As well as the fact that it's incredibly easy to make your queue behave according to whatever kind of custom business logic you might need. (Which is where additional indexes and b-trees can turn out to be necessary, and which is precisely what databases are great at.)

So to the contrary -- unless you're at Facebook-level scale, using modern databases for queues generally seems like an excellent choice. (And in the special cases where it's not, that's usually quite obvious.)

replies(2): >>Rapzid+BH1 >>sh87+IW1

>>zigzag+Ax
Money. The most common bottleneck is money and customers. Use whatever helps you get new customers faster.

Don't be scared of having to make changes in the future. Do the small amount of work it takes today to make sure your transition in the future is easy.

Transitioning from a SQL queue to redis it's only difficult if you have a bunch of SQL throughout your code. If you have that, you did it wrong.

>>blowsk+gr
Sure, but I also feel like this comment is based on an assumption that everyone works at startups. A lot of us work for larger, already established companies that are past that will we or won’t we survival stage. We are trying to design systems to solve needs for already validated products.

It might seem like everyone works for a startup, but big companies have a lot of engineers, and we still have to solve new problems there even though the company is not new.

>>hyzyla+(OP)
It's kind of like buying all sorts of running outfits and a camelbak to go on a 3 mile jog once a week. It's about overeager optimism of doing races and marathons in the future. Where in reality, you can get away with just running in the track pants and tshirt you already have on.

>>chmod7+K
I sorta do this, but my approach is more Redis-first than _just_ Redis. I try to see if I can use Redis for 99.999% of my operations and have a more durable store (like Postgres or something) as a "backup". The nature of Redis is such that even with some persistence features, we kinda have to assume that the data could go away at any minute, so I always build some way to rebuild Redis as fast as possible.

But I've run billions of queries per day against a single Redis instance with zero failures serving up traffic to large, enterprise-level customers with no downtime and no major issues (knock on wood). The only minor issues we've run into were some high-concurrency writes that caused save events and primaries to failover to replicas, and resulted in a few minutes downtime at the beginning of our experiments with Redis-first approaches, but that was easily mitigated once we realized what was happening and we haven't had a problem since.

Huge fan of a Redis-first approach, and while the haters have _some_ valid points, I think they're overstated and are missing out on a cool way to solve problems with this piece of tech.

>>skytre+vj
Your comment sort of explains why you would use your DB as a queue. It is a big task to migrate to a new system. If you already have Ppostgres or MySQL integrated and deployed. Using it as a queue may be the simplest option.

>>hughrr+gc
There are A LOT of use cases for a queue that aren't particularly computer intensive and you aren't worried about ever needing to scale - especially larger than your Postgres database.

So while this might be good advice if you will need to scale, it's certainly not blanket advice.

In the first case, I would much rather just have Postgres and not a separate queue to manage.

>>bob102+qN
Yep. Or even

"Do you really need SQLite? How to get away with a dozen secretaries, pencil, paper, and a fax machine"

I'm with ya. Why not use the right tool for the right job, given the right tool exists?

>>zapsta+Wm
Interesting! What kind of scale did you run it at?

Also keen to know if you saw any disadvantages with this approach?

>>killin+aJ
It's likely millions of internal requests, which as another comment mentions, is common in a number of industries.

>>gunnar+7a
then what's the point of this feature as part of the db, if it doesn't do persistance?

>>stickf+OK
Not sure why you were downvoted, but perhaps they were thinking about Redis transactions (lua scripts that can do multiple operations at once). If Redis is your datastore (groan), then you can transactionally enqueue tasks with your unit of work.

replies(2): >>arthur+Xc1 >>stickf+DW1

>>hughrr+LB
What about in terms of number of requests and requests per second?

>>hardwa+Cc
> then you listen to the CDC stream from the original DB you are connected to, correct?

Yes, exactly.

> This sounds like it might be vulnerable to dead consumers causing WAL to pile up without the right settings/periodic checks.

There are some subtleties around this indeed. You should monitor backlog of replication slots, so to identify inactive consumers as you say. Also, there are some corner cases you need to take care of: when listening to changes from a low-traffic database on the same PG host that also has a high-traffic database, the replication slot may not be acknowledged often enough. Debezium mitigates this by optionally writing changes to a dummy heartbeat table in the low-traffic database, so to advance the replication slot.

In Postgres 13 there's also a new option "max_slot_wal_keep_size" which limits WAL size retained by a replication slot. This prevents unbounded WAL growth, at the risk of consumers to miss events if they are down for too long.

All in all, proper monitoring and alerting is key.

>>truth_+Rx
I came here to post this. CREATE TABLE UNLOGGED is basically a redis in your postgres (with periodic dumps to disk), but with transactional joins into more persistent data.

>>aether+mB
If your data storage pattern involves a database write followed by a message queue insert, then the ability to wrap those in a transaction can be a good trade-off to avoid consistency failures between the writes.

Avoid consuming that queue directly though -- which is probably what you're thinking when saying this is a dumb idea and I tend to agree. Typically, you want to have a worker that loads entries into a more optimal queue for your application.

Bottom line though, there is not a single best "queue" product. There are lots of queues offering wildly different semantics that directly impact use cases.

>>nickjj+Tg
In the case of Python, Celery does support SQLAlchemy as a broker if I remember correctly. So in theory, you could still use PostgreSQL and also have a solid queue library.

replies(1): >>nickjj+vr1

>>cerved+pp
Murphy's law says that you're going to screw this up any number of ways. Maybe not you, specifically, but perhaps your coworker.

> committed atomically

Complex billing systems don't work that way. Worker processes are not always in these neat boxes of "done" or "not done". Much like rollbacks are a developer myth. If a process was that trivial then you wouldn't need a queue and workers in the first place!

> Also I'd detect a worker has died by recording the start-time and using a timeout.

There are many ways to solve this and many ways to get it wrong. Not working in UTC? Oops, better hope nothing runs during daylight savings changeover. Parent process died but worker finished job? Let's hope the parent isn't responsible for updating the job completion status. Large job is borderline on the timeout? Better hope parent process doesn't restart the job while the worker is still working on it. Network partition? Ut oh. CAP theorem says you're out of luck there (and typically there is at least one network hop between the DB server/controlling process and the system running the workers).

Probably the more straightforward solution is to give each worker an ID and let them update the database with the job they pick up. Then, something robust like systemd, would monitor and restart workers if they fail. When a worker starts, they find any jobs where table.worker_id = myId and then start back on those. But you still have network partitions to worry about. Again, not at all trivial.

replies(1): >>cerved+G61

>>kasey_+7I
Which is what I originally stated "boil down to not learning how powerful SQL and its programming extensions are".

One can only have bought into Mongo's story, by lacking the skills to understand how fake it was.

>>hyzyla+(OP)
I need an article how to get away with just MangoDB

>>axiosg+o3
>They use packages such as celery which happen use Redis under the hood but the user doesn‘t have to deal with these details other than pasting some Redis connection URL somewhere.

Celery doesn't use Redis under the hood.

However, you may use Redis as a message broker and/or result store.

https://docs.celeryproject.org/en/stable/getting-started/bac...

>>hardwa+je
I do think this is a product that everyone wants - support all popular models (relational, kvs, queue, log, etc) in a consistent, scalable, open source and easy to operate service. I'm not sure that this is actually possible but I think if such a thing did exist it really would dominate.

In the current reality today, implementing everything in Postgres is probably going to be slower to market (i.e. for a start-up) than using off-the-shelf products. When you do need to scale, this is when you get to learn about how valid your assumptions were in your abstraction layer - mostly likely in production. As a concrete example, Kafka isn't designed to work well with large numbers of topics. Similarly, InfluxDB isn't designed to work well with high cardinality time series. I think it is generally wiser to "skate where the puck is going" in this situation.

Of course, everything is a trade-off. Postgres is incredibly reliable (like insane) and simple to operate. I'd say for any kind of internal line-of-business type application where scalability is less of a concern you really would be doing your ops team a service by implementing everything in Postgres.

>>stickf+OK
You can actually do this with RabbitMQ - it has a feature called "Publisher Confirmations", where you send a message, and then wait for an "ack" in return.

replies(1): >>blowsk+L31

>>pjmlp+Vx
DynamoDB was designed to solve a very different problem than a traditional SQL database. When DynamoDB (and the other Cassandra flavors) were released there were no databases doing multi-master failover with high write throughput - we are talking about TBs, not GBs. It's not a coincidence that Google, Facebook, and Amazon all had to write to their own database at around the same time (BigTable, Cassandra, Dynamo).

With those new tools you had other companies building on top of those databases for far cheaper than a license to MSSQL or any other OLAP of choice would give you.

>>hughrr+gc
Lean startup begs to differ :p

>>aether+mB
For years I've been using RabbitMQ in small teams, and me as a one-man team too.

As long as you don't need clustering (even a single node can handle some pretty heavy load), it's actually really simple to setup and use - way easier than Postgres itself, for example.

My biggest beefs with it have historically been the Erlang-style config files (which are basically unreadable), and the ridiculous spiel of error messages you get if you have an invalid configuration. But thankfully RabbitMQ switched to a much simpler config file format one or two years back, and I understand the Erlang OTP is working on better error messages and stack traces.

>>derefr+9R
> In exchange, you get the ability to not have to set up / maintain any more infra than you already have.

And, you mean, a PGQ will not need any special monitoring other than the common DB metrics?

For instance, if I ever run a queue, it's just due ops diligence to know the length of the queue, average time an item spends in queue, throughput over time, among others. Are there standard monitoring modules that would check this for a PGQ? Because in exchange for setting up a proper queue cluster, compatibility and documentation for common stuff like this is also what you get.

The first one is particularly a sticking point for me. If you don't do it right, you will end up issuing a COUNT on a table periodically. You might say it's acceptable for most companies, because they don't need high-performance queues, but I specifically need my monitoring to be reliable during times of unexpected high load. Also, ideally, there is close to zero chance that my telemetry is the one that ends up causing me trouble.

Also, why does no one consider the added "surface for human error" that using a DB for a queue introduces. For instance anyone can drop in the DB and perform a query you were not supposed to. A malicious actor can update all outgoing emails in the queue to another recipient. If these are sensitive emails like password reset or OTP, good luck. A dedicated queue process does not allow such operations.

I have to say, it's very perplexing to me that people (especially in HN) would avoid using the right structure just because they have to invest a bit more work.

replies(1): >>derefr+W21

>>sh87+eM
Well, by your admission, you used Redis for a problem domain it wasn't suited for in the first place. How is this an argument for using in-database queues?

> use it blindly without fully understanding our requirements and how redis helps scale

I'm sorry I don't get how I could come across as advocating the use of Redis blindly. My point is if your data flow looks like a queue, then use a queue, don't hack a relational DB to become a queue. I think that's reasonable rule of the thumb, not going in blind.

replies(1): >>sh87+IK1

>>skytre+v11
> I have to say, it's very perplexing to me that people (especially in HN) would avoid using the right structure just because they have to invest a bit more work.

You're imagining this as if it's just the CapEx (the work of setting up Redis), but in reality it's the OpEx (the work of maintaining a Redis cluster) that kills you. Think of it in terms of the number of ops salaries you have to pay. At scale, this number goes up in proportion to the complexity of your stack.

replies(1): >>skytre+p51

>>Gordon+d01
If you’re using the same database for your queue as your business entities, then you can wrap both in the same transaction - I think that’s what the parent post is getting at. For example, save a customer entity and add a job to send them a confirmation email. As far as I know, that’s something you can’t do natively if you’re using RabbitMQ and PostgreSQL. Of course you can work around it, but when your workloads are low you don’t need to, which is great for early stage products.

replies(1): >>spulla+981

>>jayd16+oQ
It's a tradeoff between software and infra complexity.

I would argue that if you've built a system up from scratch, this is much easier to debug and maintain than a foreign piece of software. Rabbit is just way overkill if you have a couple of jobs per hour, for example.

>>boardw+UL
> You (or whoever is doing ops) will have to pay with your time to operate Redis until the product dies though.

Repeating my other comment: do you mean to say using an in-DB queue will not need special monitoring other than what's already standard metrics for DBs?

Since a queue is a different structure, I would want to know metrics that might be unfamiliar for standard DB monitoring. Queue length, queue throughput, etc. Can I get that from my DB queue with negligible cost?

Another thing, there might be standard practices and patterns for in-DB queues, often encapsulated in a nice library/middleware. But as databases are not queues there is no assurance from the vendor that those patterns will remain applicable in future versions.

> Paying attention to the tips/tricks/gotchas is something you'd pay once

Actually, I disagree with this. Everytime someone connects to that database, they have to remember that one special table which is used as a queue, and remember to tread differently where it is concerned.

Setting up a queue? You do that once and everyone who interacts with it will not forget it is a queue. Heck you can't even do your usual DB operations there. How can you not remember it is a queue?

>>hyzyla+(OP)
We’ve seen success in our current node.js application using pgboss [1] for job queueing. Since the primary database is Postgres, it’s nice to not introduce another dependency into the system, and take advantage of the infrastructure that we already have. The library supports most of what you’d expect from a job queue, and is crucially well maintained!

That being said, I agree with other comments that this is somewhat swimming against the tide. Redis is much more commonly used, and so if you don’t mind adding another service into the mix, I’d probably recommend going with Redis instead.

[1] https://github.com/timgit/pg-boss

replies(1): >>queich+691

>>kyleee+DP
That's solid middle management thinking right there!

>>hyzyla+(OP)
Why would you use a DB or Redis for job queuing when there are extremely inexpensive and highly optimized queuing systems in every major cloud provider?

I've had so many horrible experiences with DB-based queuing over my career once you get to a certain scale. Just use a message bus, bro!

replies(1): >>mixmas+ed1

>>derefr+W21
Actually, if you read the rest of my previous reply (as opposed to an offhand comment at the end) you will see that I'm considering the logistics of operating an in-DB queue as well.

Using an in-DB queue doesn't give you zero OpEx does it? Maybe you can make the argument that it's cheaper but it's cheaper for a reason: the job wasn't done properly, so to speak.

Both options introduce new failure modes and op costs into your system. Might as well do it properly if (and this is a huge IF in my opinion) for slightly more cost. When you run into a failure it's standard, maybe the solution is even one Google search away as opposed to realizing, one hour into a late-night debugging session that, shit, that table should be treated as a queue!

replies(1): >>derefr+Jb1

>>mirekr+yS
True

>>agf+7K
Is there any other solution other than going full STM on this?

replies(1): >>mathfa+AQ1

>>hyzyla+(OP)
I think it more or less depends on the mindset of the developer. For example from my observation developrrs from telecom or other big companies tend to turn every program into a "solution" i. e. it must have a full weaponry and other bells and whistles. Me? I'm the opposite amd won't add anything or fix anything unless it is really useful.

>>deckar+DY
I can't speak to billing since that's not my area but in general I disagree. A process doesn't need a non-trivial flow to need a queue and workers. I've done this with route optimization requests. They have a priority and need to be run in discrete processes. Once they are done, they are done.

The database records the start time, in the databases system time, in UTC.

The parent process can't update the job status, it can only cancel jobs.

There's a pool of workers. Each working may fetch a pending request or wait. Once it has fetched a request, which sets it as pending, no other worker is able to fetch this request. Only this worker may alter the status of its request and it may only set it to completed, and it may only do so with its results.

The worker periodically writes its status to a separate table. If such status has not been written in a timely manner, the worker is assumed to be stalled and a new child request is created, available to other workers.

Putting this logic into the database instead of a parent process, which managed a shared concurrent queue, has been a by far more pleasant solution. Granted, we're not talking large numbers of concurrent requests. Ten as opposed to ten thousand. If we're talking hundreds of concurrent, short requests, I can see this starting to become problematic.

I think there's huge difference in the needs of different implementations of queues and workers so I'm interested to hear more about workloads where this approach fails and specifically in which regards. In my experience, it's been pleasant.

replies(1): >>deckar+y92

>>blowsk+L31
In that case it doesn't help you that much since you can't transactionally send an email but you would get at least one attempt.

replies(1): >>blowsk+Ic1

>>pilif+p9
> LISTEN ties up one connection completely

I've seen this twice in this thread, but I don't know what that means. Can you explain a bit?

replies(1): >>sa46+Di1

>>hyzyla+(OP)
I just did a thread on the topic myself https://twitter.com/dosco/status/1402909104012623873

>>hyzyla+(OP)
And others like this thread on building a MongoDB like JSON database in Postgres in one line. https://twitter.com/dosco/status/1401413712842346501

>>fiznoo+t41
Out of curiosity, how many jobs are you queuing on average and how did it perform for you at that level?

I have been thinking about using the library, mainly because it's less operational hassle than running another service that has to be sticky to one host with persistent storage and backups.

replies(1): >>fiznoo+ID4

>>hyzyla+(OP)
And a search engine in Posgres instead of Elastic Search https://twitter.com/dosco/status/1400643969030127620

>>mianos+8l
I worked at a place using pg_queue (the person who created it worked there before going to Skype IIRC) and maybe it was working well operationally but it was not a great experience to develop on it. It was hard to generate events and debug/observe them (it was height years ago so the details are a bit fuzzy).

MS also moved from it afterwards when they acquired Skype but I guess it was more because of the shift to SQL Server.

replies(1): >>mianos+9M1

>>skytre+p51
I read your whole reply. I didn't see any mention of anything I'd consider an OpEx cost.

Re: Monitoring

Presuming you're already monitoring anything at all, adding monitoring for a new piece of infrastructure is a pure CapEx cost. You presumably already have monitoring infrastructure running. Its ingestion, by design, will be 99% idle — i.e. it won't need to be scaled horizontally proportionally to the number of components. The only thing needed, then, will be careful up-front design for the monitoring.

Which is usually also a cost already paid for you in advance when you use an abstract job-queue library. They all know how to expose Prometheus metrics endpoints, and they do the accounting to serve those endpoints efficiently (usually using process-local per-job-producer and per-job-consumer counters, which you must then roll up yourself at the PromQL level, taking the irate() to find spikes.)

Re: AAA

> Also, why does no one consider the added "surface for human error" that using a DB for a queue introduces.

Because the type of person who reaches for their DB first to solve a novel problem, is the same type of person who understands and makes use of their DB's security features, to the point that doing more DB-security-config isn't a complex one-off problem for them, but something they can whip off in a few minutes.

For a simple option, you can create two service-account DB roles: a queue_producer, and a queue_consumer. You can put all the job-queue implementation tables in a schema owned by the queue_producer; and then grant the queue_consumer SELECT privileges on all the tables, and UPDATE privileges on some of them. Then, nobody but the job_producer (or a DB superuser) can create or destroy jobs; and nobody but a job_producer, a job_consumer, or a superuser, can read or modify jobs. (Your job-queue abstraction library within your service usually maintains its own DB connection pool anyway, so it's no sweat to have those connections use their own credentials specific to their job-queue role.)

For a more complex — but perhaps useful? — option, the tables themselves can be "hidden" behind stored procedures (DDLed into existence by the abstract job-queue library), where nobody has any privileges (incl. SELECT) on the tables, only EXECUTE rights on the sprocs. And the sprocs are carefully designed to never do anything that could have an unbounded CPU time. Then anyone can "check up on" or even "insert into" the job-queue, but nobody can do anything "funny" to it. (My god, it's an encapsulated API!)

Once again — the libraries that abstract this away, already think about concerns like this, and choose one or the other of these options. That's why libraries like this exist, when the "core premise" is so simple: it's so there's a place to put all the fine details derived from man-years of thought on how to make this approach robust.

-----

On a completely different track, though: having a queue in the DB can sometimes be the optimal (i.e. the "if we had infinite man-hours for design + implementation") engineering decision. This case comes when the thing the queue is operating upon is data in the DB. In such a case, the DB data modification, and the job's completion, can succeed or fail together atomically, as part of a single DB transaction.

To accomplish the same thing when your queue lives outside the DB, you usually end up either needing some really gnarly distributed-locking logic that both your DB and your app layer need to know everything about (leaky abstraction!); or you need to completely remodel your data and your job queue into an event-streaming paradigm, so that you can "rewind" one side when the other side fails.

replies(1): >>skytre+Ig1

>>ickyfo+5z
That is very interesting, thanks. I will look into a possible migration to Redis, which in any case is so lightweight and simple to run, it seems.

>>hyzyla+(OP)
"Do you really need < insert well-tested, verified technology > ? Why not just hand-roll your own in < different technology > ?"

Hopefully the emphasis is clear why this is silly.

replies(1): >>meitha+5d1

>>blowsk+gr
You say that as if rabbitmq takes more than an 8th grader to set up in a few hours?

replies(1): >>Gordon+9i1

>>blowsk+gr
no. he/she actually has a point. a database is not a queue

https://mikehadlow.blogspot.com/2012/04/database-as-queue-an...

replies(1): >>geggo9+It6

>>spulla+981
It doesn’t help guarantee the email is sent, but it makes it easier to find and handle unhappy edge cases, which is where a lot of time is spent in an early-stage startup.

>>politi+lX
Real time trading systems can totally get away with only using Redis, outside of historical data (which obviously you'd stick into Postgres or your favorite flavor of DB)

>>void_m+jc1
Indeed this is silly, to me this sounded like "Do you really need a drill? Just hammer this screw down!".

replies(1): >>void_m+be1

>>jugg1e+E41
Some folks are not using the cloud for everything. Portability may be an issue too.

replies(1): >>jugg1e+z2m

>>RedShi+ph
please don’t. source control/versioning/deployment become a nightmare

replies(1): >>taffer+Zn1

>>hyzyla+(OP)
The issue with Redis is that it's distributed story is not great. I wonder if their Raft experiment is finally GA or not. https://github.com/RedisLabs/redisraft

>>blowsk+gr
Not sure. All messaging solution brokers walked away from relational database based storage long time ago (I've seen how badly JBoss 4 messaging with persistence sucked) and for a good reason as either database was occasionally killed by messaging traffic, or messaging traffic was crawling because database was too slow.

In these days with Docker, ready cloud solutions getting the message broker is so simple that there is no reason to use database as a broker. If we know that our traffic will be so small, just install DB and message broker on the same VM, to avoid spending more on hardware or hosting.

>>gizdan+D8
When you commit to Postgres and the server acknowledges it, you know for sure that it's been written to disk and that it will survive anything but a hardware disk loss (or, obviously, system/FS bug). When clustering is enabled with synchronous writes, you can also be confident that the data has been recorded to another node as well.

With redis clustering, there's no guarantee the data has been replicated. I'm not even sure there's any guarantee the data you just asked to be recorded be stored even once if a power outage happens immediately after the request.

>>meitha+5d1
As with all things, if the goal is exploration, hell yes try this out.

If you're actually trying to solve the problems Redis is meant to solve, just use Redis. Otherwise you become the sole maintainer of a glued together, suboptimal version of Redis based in Postgres. For most people, they'd rather build whatever caused them to need Redis in the first place, rather than building their own internal worse-version-of Redis.

>>bryanr+6K
mmmm no

They allow overbooking the flight but you can't book the same seat twice.

Overbooking can make financial sense -- even if you need to hand a little compensation from time to time it's still better than flying with empty seats. Of course it sucks big time for the travelers especially if there are no volunteers. IDB is miserable. https://www.transportation.gov/individuals/aviation-consumer...

replies(2): >>mianos+1v1 >>bryanr+AB1

>>abhish+2G
I had a similar question asked of me in an interview. The interviewer was also surprised by my solution of letting the database decide the winner.

replies(1): >>abhish+8G2

>>berkes+ik
Most web apps I've worked on have had queues in the database. The operational simplicity of only having a database has far outweighed the code complexity of using the relational database as a queue. However the performance would not have scaled. Luckily the performance was well above peak load of the actual systems.

>>hughrr+gc
It can absolutely scale well if you design the database properly. However, it’ll probably be fairly high latency compared to a Redis or Kafka stream.

The things that often make databases appear slow is that they’re typically configured for large random reads rather than small uniform writes. And they have insane initial response times because of all the IPC they typically do when starting a session.

But aside from that, if you use a heap table without an index you’ll can probably get similar throughout to Kafka. And if you turn Fsync off you might even get up to REDIS speeds.

replies(1): >>onepla+yf1

>>hyzyla+(OP)
I tend to ask the opposite question more often.

Do you really need Postgres?

>>sontho+ff1
It's possible at scale, but the engineers that can do that are usually 1:20 outnumbered within most organisations. (when scoping out FAANG and co) Asking a developer to design a database is not even normal process today, unless the ORM does it for you, very little actually is done with the database. It might as well be a flat file...

replies(1): >>api+Mf1

>>skytre+KL
> Absolutely and that reason is, you are still a small team, with a small user base to boot. That's fantastic opportunity to learn a new technology and build on it properly! Remember everything is easier in software engineering if you assume you have no users[1] and your situation is as close as it gets to this ideal. Leverage it.

I have to disagree. Of course code quality is important, but building things "properly" because "we may need it later" is a great way to kill a project with complexity. KISS, YAGNI. An early startup is, IMHO, not a good place to learn about new frameworks while getting paid - you're on borrowed time.

Make a back-of-the-envelope calculation about how much throughput you need. E.g., if you expect to have 10,000 users, and each may make one request per hour, you're dealing with 3 qps. Anybody who wants to bring in a new dependency for this, needs some talking to.

(If you already need Redis anyway and it's a better fit than Postgresql, then sure, go ahead.)

replies(1): >>skytre+Ak1

>>onepla+yf1
That’s sad. I learned relational DB design before I even really got into systems programming and it was really a valuable thing to learn. You can be so much more efficient and less error prone with normalized data structures.

There’s a reason SQL is still around after all these years. It’s basically math and while the syntax is a little moldy the principles are eternal.

>>derefr+Jb1
> For a simple suggestion, the job queue can live in a schema that is owned by a job-queue service-account role. Nobody other than that user (or a DB superuser) can issue any query in there.

Yes service accounts are fine and dandy but [a] isn't that more hoops borne by choosing the wrong tool for the job (gotta stay mindful of those permissions, you can have an Ansible typo, etc) and [b] as I said, proper queue services would not even allow tampering of enqueued data. For no extra cost other than installing the service. In contrast, your service account falling into a malicious actor is still a threat vector.

> For a more complex — but perhaps useful? — option ...

Now this is just a queue service with extra steps running in a relational DB instead of natively as an OS process. You did cite it as just an option but I don't see why this is an attractive option.

> I didn't see any mention of anything I'd consider an OpEx cost.

It seems to me we have differing definitions of OpEx and, judging by your previous comment, you value this almost exclusively in terms of "the number of ops salaries you have to pay". Even if I play by that definition, I could tell from experience (and also corroborated by other commenters here, in other subthreads of this discussion), that operating a Redis cluster does not need extra warm bodies. The people who monitor that your app responds, your servers aren't running out of memory, could take on the task with little to no additional training.

The experience I want to cite, bearing in mind of course that everyone's mileage varies: in my previous job, customer base of 2M and growing across two countries, we operated and monitored a redundant Redis cluster with an ops team of 3. In my first job I talked about in another subthread here, we managed three high-traffic Redis queues (and a few other smaller-traffic special cases) with a team of 7. In both jobs, we also did active across-the-stack development; we weren't dedicated to monitoring, nor to the components related to the queues.

In fact I would argue running a DB (queue or no queue) is just more complex than running a queue service. Rather telling is the need for "DB Administrators/Specialists" but no such equivalent for Redis or other proper queueing solutions.

> Which is usually also a cost already paid for you in advance when you use an abstract job-queue library. They all know how to expose Prometheus metrics endpoints...

Honestly, this sounds fine but I'm rather wary of the caveats that might be present. How do they do it efficiently? What do I need to keep in mind to keep things efficient? As such, unless you can point me to your preferred job-queue library which does all these wonders, I have to reserve judgment.

Edit: I just remembered mentioning this in another subthread. Even if you have a fantastic library/middleware abstracting all those queue concerns, that ties you in with that library. If someone wants to build integrations into your queue, they have to play by that library. If majority of your app is in Java (and so is the library), and Data Science (who, in this example, uses Python, not improbable to happen) wants to share your data pipeline, if that library isn't available for them, tough luck I guess? More dev time for someone.

And also, whatever underlying DB features or semantics your library might rely on in order to enforce a queue structure, you can't be assured that future versions of your DB would support those features because, you know, your DB isn't really in the business of queues. It opens you up to running an outdated DB version just to keep that queue functionality.

replies(1): >>derefr+hn1

>>ikiris+Ac1
To be fair, you can pull and run the Docker image in a few minutes, and the defaults will work out-of-the-box for a lot of folks.

If you don't need clustering, RabbitMQ really is pretty simple to setup.

>>postgr+O81
When a client connects to Postgres, Postgres creates a new process just for that client. Separate processes are great for isolation but it means Postgres connections are a bit expensive (this is an active area of improvement). Postgres really starts to struggle once you a have a few thousand connections, even if those connections aren’t doing anything.

The common workaround is to use a connection pooler like PGBouncer so that clients reuse connections. This approach doesn’t work for LISTEN because typically a client will listen for its entire lifecycle so you can’t share connections in a pool.

replies(2): >>taffer+cp1 >>postgr+fd2

>>hughrr+NC
If latency is important, I would argue that ordering doesn't help you.

That is, if you have the computational resources to service your entire queue within the required latency, then ordering doesn't matter, because workers could simply claim the first available random task, and it doesn't matter who finishes first or last. If you don't have the resources, then ordering still doesn't matter, because you are blowing your latency budget anyway.

For example, say users are uploading photos which go into a queue for scaling, cropping, etc. Let's say 10 users per second are uploading a photo. In the best-case scenario, the queue is empty, so all users should see their photo fully uploaded quickly. Worst-case scenario, your queue has a million tasks pending, which benefits a handful of early users who see their photos completed, whereas most users will simply wait forever. Prioritizing early users (i.e. queue order) doesn't benefit anyone when everyone will be miserable, so it's an arbitrary choice.

The ordering imposed by queues can create an illusion of fairness when task execution time (i.e. end-to-end latency) is not taken into account. If one user uploads a 1 gigapixel photo, it's most likely going to take up more processing time than other users, and again, there's no reason to prioritize that user over any other user just because their request arrived earlier.

If you have random-order delivery, you can still measure latency, capacity, and utilization, as well as have retries and acking, and utilization-based scaling.

replies(1): >>Rapzid+iI1

>>hughrr+GA
What would you use redis for, if anything?

replies(1): >>yawara+Wo1

>>_ugfj+z2
You really really wanna do everything (locking, working, unlocking, deleting or completing) in a transaction, and use SKIP LOCKED to find work. It's actually pretty high performance when you so this.

replies(1): >>phumbe+0k1

>>miohta+tx
> a sensible risk tradeoff

Note the tradeoff doesn't make sense as soon as you're operating at a meaningful scale. A small likelihood of failure at small scale translates to "I expect a failure a million years from now", whereas at large scale it's more like "a month from now". Accepting the same percent risk of data loss in the former case might be OK, but in the latter case is irresponsible. Provided whatever you're storing is not transient data.

replies(1): >>miohta+tc2

>>hughrr+yB
Currently at work we use a postgresdb for queues for long-running persistent jobs, but I'd like to move away from that model since I have doubts about how it'll scale.

I've thought about using rabbit mq, but have a few questions: - it appears that when a job is consumed, it's gone. How would I maintain a record of the jobs & status of the job in rabbitmq? If I wanted to display job updates how would I handle that? I didn't think you could update a message once it's already in the queue

Or am I missing the mark? Do I want to separate the business entity "job" that maintains a status and updates and such from the "thing that dispatches jobs to workers"? And when a worker Has the job it just updates the business entity in the database?

replies(2): >>doktor+kq1 >>Rapzid+EJ1

>>ukd1+uj1
do you have a more detailed example on how this could look like?

replies(2): >>taspeo+zl1 >>subleq+Yq1

>>blowsk+gr
I’ve watched companies maintain their non-scaling DBs all the way to the grave.

It’s a killer.

If it’s “just a cache” OK. Fine. If you need it for something important like queuing or certain caching scenarios - be aware it won’t scale well.

>>jayd16+oQ
There's off the shelf libraries to do this in nearly every language.

It's always about tradeoffs in context, but I have seen plenty of instances of someone setting up a rabbit cluster for something that could be done trivially in their existing db cluster via the above.

>>yongji+Hf1
> but building things "properly" because "we may need it later" is a great way to kill a project with complexity

Emphasis added because I feel like I addressed this in the paragraph immediately after the one you quoted:

> Plus, as me and others keep saying, Redis (and other proper queues) isn't a complex addition to your infra

I'm speaking out of experience and, as I already pointed out in another subthread, Postgres is far more complex than Redis. Consider the presence of "DB Admins/Specialists" and the lack of counterpart thereof for Redis and other queuing solutions.

Of course, if queues are not central to how your platform operates, you might be able to get away with Postgres. I still advise using Redis as a reasonable hedge against someone famous tweeting organically about your service because in this case, you don't want your DB to go down because some queue table had a surplus of transactions (or vice versa).

Not to mention, at an early stage, your tech decisions set precedents for the team. Maybe you have 10K users with a low qps but soon you are sending marketing emails to them and your system has periodic bursts of queue activity for all 10K users at once. When discussing this marketing "feature" rarely anyone thinks, "Hey we can't do that with our Postgres queue", rather "Yeah I saw functions in our codebase for queuing---this is doable". This is a small effort but a huge technical investment for later on.

replies(1): >>rkk3+H12

>>_ugfj+z2
Thanks for posting this! It is relevant to me right now and great example. Especially since I'm using a new to me database, Spanner.

Specifically I similarly pull rows into a 'queue' like that linked sql by WHERE primary key, primary key, and queue_status ORDER BY timestamp, uuid LIMIT 10. I was worried what would happen with many requests around the same time if it would accidentally pull duplicates into 'queue.'

I think hopefully I've implemented Spanner locking read write transaction correctly and that this won't cause problems at scale (we're not even close to google scale lol).

>>phumbe+0k1
The article starts by discussing SKIP LOCKED

>>kwdc+4U
Mostly boils down to Redis having a richer API, and memcached being faster / more efficient. The new EXT store stuff allows you to leverage fast ssd's to cache stupid huge datasets. Memcached is also one of the most battle tested things out there in open source. I've used them both plenty over the years, but tend to lean towards memcached now unless I really need some Redis API feature.

>>skytre+Ig1
> Now this is just a queue service with extra steps running in a relational DB instead of natively as an OS process. You did cite it as just an option but I don't see why this is an attractive option.

Your DB then shares a WAL log with your queue. Meaning a single managed physical replication pipeline for them both. Meaning only one set of leader-election issues to debug, not two. Meaning one canonical way to do geographic high-latency async replication. Meaning disaster recovery brings back a whole-system consistent snapshot state. Etc.

Honestly, if I had my way, every stateful component in the stack would all share a single WAL log. That’s what FoundationDB and the like get you.

> In fact I would argue running a DB (queue or no queue) is just more complex than running a queue service.

Well, yeah, but you usually need a DB. So, if you’re going to be paying the OpEx costs of the DB either way, then you may as well understand it deeply in order to wring the most productive use you can out of each OpEx dollar/man-hour spent.

(I feel the same way about Redis, as it happens: if you need it, and are locking your code into its model anyway, then you may as well take advantage of its more arcane features, like Redis Streams, Lua scripting, etc.)

However, maybe our company is uncommon in how much our service literally is doing fancy complex DB queries that use tons of DB features. We’re a data analytics company. Even the frontend people know arcane SQL here :)

> that ties you in with that library

The difference between what you / apps / abstracting libraries do in Redis, and what they do in an SQL DB, is that in the DB, the shape of everything has to be explained in a vendor-neutral manner: SQL DDL.

Sometimes Redis-based solutions converge on conventional schemas; see e.g. Sidekiq’s informal schema, which several other queuing systems are implemented in terms of. But when they don’t, there’s nothing you can really do — beyond hacking on the libraries involved — to bring them into sync.

In an SQL DB, anything can be adapted into the expected shape of anything else, by defining SQL views. (Heck, in an SQL DB with Redis support, like Postgres with redis_fdw, the Redis data can be adapted into any shape you like using SQL views.)

And that’s further enabled by the fact that the DB had received from the app, through DDL, a schema, that you can examine, manipulate, and refactor; or even synthesize together with other schemas.

> you can't be assured that future versions of your DB would support those features

You can if those features are in the SQL standard. I’ve never heard of a DBMS regressing on its level of SQL standard support.

replies(1): >>skytre+6s1

>>rantwa+kd1
I never understood this point:

- You have a separate schema for your procs

- You define your procs in files

- You write tests for your procs

- You put your files into git

- Then, in a transaction, you drop your old schema and deploy the new one

replies(1): >>rantwa+PF1

>>Aeolun+z3
I've worked at plenty of businesses with annual revenue in the tens of millions using "non-scaled" solutions without any issue.

If we rephrase question to the material reality of "What will we do when we're pulling in $20,000,000 a month?" then you can see how silly it is.

I dunno, just buy a company then and use their stuff? It's literally not a problem that a pre/early-revenue company should or realistically can solve. It's part of "founders delusion" - the same kind that make them get a huge office that's mostly empty or all that unused computer hardware sitting in the closet. It's just gambling.

It'd be like saying "we have $500,000 in the bank at this new restaurant and need to be ready service 5,000 national locations with our staff of 5."

I mean no, by the time that's no longer just a mental exercise (as in you have, say 4,700 locations) you'll be a different kind of institution better able to solve the problem. Literally just forget about it. It's so profoundly irrelevant...

Solve the problems that are actually real, you've got enough of them, I promise.

replies(1): >>simonj+Vq1

>>mikekl+8j1
I assume a key-value store.

replies(1): >>hughrr+Ua2

>>sa46+Di1
You need one connection per worker thread, and realistically you would only have one worker per cpu core. So how many LISTENing connections do you really need?

replies(1): >>sa46+Mv1

>>_ugfj+z2
Problems with this approach:

1. Recovery.

2. Concurrency.

3. Polling.

In more detail:

1. Recovery. Suppose a worker dies while processing a job. The job should be retried, but how do you know this, since expire > 0? You can impose a timeout, but that has drawbacks -- there might be long jobs you repeatedly start but never finish. To recover failed jobs without imposing a timeout, you'd have to run both the UPDATE and the job processing inside a transaction. That way, a failed job can result in an implicit ROLLBACK, freeing the job for future worker attempts.

2. Concurrency. So now recoverable workers are trying to claim jobs inside a transaction. If there are 10 jobs and 10 workers, we want all 10 jobs processing concurrently. However, each worker SELECTs the first item_id and tries to UPDATE that same row. 1 worker wins. The other 9 workers block while that transaction completes, then update zero rows. Concurrency will hover around 1.

3. Polling. In the transactional UPDATE approach, there's no way to tell whether there are free jobs short of trying to claim them via UPDATE, which blocks. So you must poll, and either burn cpu on quiet queues, or introduce an artificial delay into job processing times. The beauty of the SKIP LOCKED approach is that it can tell whether there are free (not currently processing) jobs. Even if you wake all 10 workers up via LISTEN / NOTIFY for a lone job, 9 workers in the thundering herd will fail to claim the job, and go back to sleep.

I blame TFA for not sufficiently explaining the SKIP LOCKED approach. A much better explanation is here: https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...

replies(1): >>jsmith+8f6

>>hyzyla+(OP)
I feel like sanity is being restored. Maybe I’m lazy, but yeah I use PostgreSQL for everything I can. Zookeeper is great, but I’ve used PostgreSQL for distributed locking & queueing. Sometimes it’s quicker for me to write a 10–20 line algorithm than install a new piece of infrastructure.

>>Jarwai+Tj1
You might want to look at something like SWF or Temporal

>>kristo+zo1
I feel like I've spent half by career telling people this.

My general recommendation is to engineer for only 1 order of magnitude greater than you need today, but even that stops applying eventually.

replies(1): >>kristo+bv1

>>phumbe+0k1
https://github.com/gavinwahl/django-postgres-queue

>>trulyr+lY
It is supported but it's classified as experimental and not officially maintained by the core devs based on their documentation.

>>derefr+hn1
> Your DB then shares a WAL log with your queue...

I feel like the understanding we can come to here is that we have differing definitions of necessary complexity.

You did mention you work in data analytics and I have worked and am working in more traditional product-oriented gigs. Everything you mentioned are nice, impressive even, but to me they are toys. I don't need a queue with a WAL stream, or PITR backups. Queues, to me, are inter-process communication mechanisms, or a means to concurrency. In fact, worst case, you can delete the queues or restart Redis even without triggering a disk write (though I note this is another feature that comes free with Redis); it would inconvenience our customers but they can always just retry.

Of all the benefits you mentioned, leader-election is the only one I could make a good case for.

> then you may as well understand it deeply in order to wring the most productive use you can out of each OpEx dollar/man-hour spent.

Understanding it is one thing but using that understanding to create/develop (CapEx) and then maintain (OpEx) an in-DB queue feels like unnecessary costs. CapEx to install Redis is practically nil, and you seem to agree that running it is far simpler OpEx than PG too ("Well, yeah, but...").

Of course, I keep in mind your point about libraries doing the dirty work. But if it all boils down to a third party in the end, the point I'm replying to above is rather moot no? And if it comes to a third dependency anyway, I've already reasoned out my preference for an actual queue solution.

> explained in a vendor-neutral manner: SQL DDL

> You can if those features are in the SQL standard. I’ve never heard of a DBMS regressing on its level of SQL standard support.

While this is an ideal case, my experience so far shows that every major DB package relies on their own tweaks to the SQL standard, and so "If those features are in the SQL standard" turns out to be a pretty big "if" to ask. I don't worry about vendors regressing on SQL standard support but rather that the libraries for DB queues are relying on the non-SQL-standard features/behavior to mock out a full queue functionality---non standard behavior that has no guarantee of consistency across versions.

I mean, if there's a DB queue middleware/library that works across DB vendors, be it Postgres, Oracle, Maria, MySQL (with a reasonable storage engine, perhaps) then that's a convincing argument that the SQL standard can support queuing sufficiently. But otherwise, this counterpoint is a hard sell for me.

replies(1): >>derefr+gy1

>>hyzyla+(OP)
Security is another factor in choosing not to use a database for things like queues. I like to keep access to the database as tight as possible. I don’t want to start dishing out network and account access to services just because they need a queue or a distributed lock. I could run a separate database instance but that’s worse than just running Redis.

>>Michae+Tr
M2M tables shouldn’t be a problem except at the very high end, in which case you’ll have a number of mitigations at your disposal. ?

>>truth_+Rx
Unlogged tables ARE NOT in memory tables. They are written to disk like every other table, but unlogged tables don‘t have use the wal and are therefore much lighter.

>>before+9r
Generally used for caching expensive lookups, including job queues, and other nifty patterns. Before that memcached was most popular, but it didn’t save to disk, where redis eventually does. Before that, it was roll your own or ram disk.

>>chx+ne1
They obviously don't have the transaction semantics worked out. I have been at the gate and the person in front of me had the same seat number. They read it out. When I presented my pass the girl politely took my ticket and printed another one. If the person was not right in front of me I would never have known why. It is not unusual.

>>simonj+Vq1
The response is easy "find me an otherwise successful company that actually collapsed from failing to scale things like a database and we'll use it as a case study."

Security and PI stuff is unfortunately something you do have to be super anal about these days but scalability? no.

>>kortex+7U
Redis and I have a golden hammer agreement. I keep finding new ways to use it and it just keeps working.

>>virapt+lD
I find the downvotes weird without any explanation why the raised issue doesn't sound important.

>>taffer+cp1
You generally have more than one database connection per thread. As an example, consider Node.js which acts like a single threaded process. You’d probably want to be able to handle more than 1 database query concurrently since network latency tends to dominate OLTP requests.

How you setup LISTEN and NOTIFY is app dependent. In a multi-tenant database, you could have 1 NOTIFY channel per tenant.

As you scale, you probably do something that listens in a smarter way, maybe 1 Go channel per client with a single LISTEN instead of 1 database connection per client. The downside is that now the app code is responsible for tenant isolation instead of the database.

replies(1): >>taffer+Vs2

>>johbjo+rT
That's requires maintaining more db locks.

>>skytre+6s1
> it would inconvenience our customers but they can always just retry.

We have very different things in our queues :) An MQ, to me, is a place to put stuff that needs reliable, at-least-once delivery. Like, say, invoices. User never receives that, bad things happen.

There's very little difference between that type of data, and the data in your DB, while it's "active." It's only once the data in the queue is fully ACKed and no longer relevant that the data behaves more ephemerally than first-class DB state.

> and you seem to agree that running it is far simpler OpEx than PG too

I did not say that. In our stack, we run a Redis cluster on top of Kubernetes, because it's nominally semi-stateless. In reality, it's our biggest SPOF. It fights the container daemon around memory allocations (why does a Redis container that uses 2GB memory steady-state need a 64GB memory limit to finish starting up!?) and has weird dynamics around startup readiness (i.e. Redis started from a large AOF-with-RDB-prolog file will start serving requests for just a moment, then immediately stop again to finish loading. This confuses k8s into thinking it's done with startup-probes and is ready for readiness-probes. Which it's not.)

Redis, when configured to be both Highly Available and durable (which is not your use-case, I know) is really just less-robust than a DB, with less management tooling, fewer docs, fewer people who know how to scale it, fewer third-party solutions pre-adapted to expect it, etc. Because, to paraphrase: Redis isn't really in the business of being a database. And yet that's exactly what you end up with, when you try to use Redis as a reliable at-least-once MQ: a database. Just, a kinda sucky one. (One that doesn't even have a concept of users with different privileges; let alone RBAC!)

And yet, that actually is the niche of Redis. It's the durable, client-server cousin to in-process peer-mesh state-sync solutions like Hazelcast/Mnesia. Which makes it kind of weird that it's not particularly tuned at solving the problems in the one niche unique to it.

(I have nothing against Redis! It has non-unique use-cases where it works great, e.g. when it's being used as "memcached with more data structures", or as a coordination/locking server instead of Zookeeper, etc. A robust reliable MQ it is not. My argument is that your average RDBMS is closer to being a robust, reliable MQ than Redis is. Though, of course, you'll get the best robustness+reliability for that use-case out of a "real" MQ — Kafka or NATS or something.)

> but rather that the libraries for DB queues are relying on the non-SQL-standard features/behavior to mock out a full queue functionality

They don't have to, because everything you need to implement reliable queuing semantics is already part of [the lowest-common-denominator part of the] SQL standard [that everything that calls itself an RDBMS implements.]

These libs have DB-specific adapters for two reasons:

1. There's no such wire protocol as "SQL" — every RDBMS speaks its own wire protocol to convey your SQL statements to the database and convey result-sets back; and so, unless you're using a language with a common API abstraction for RDBMS client libs to be built against (e.g. JDBC), then each such client lib ends up presenting its own distinct API to callers, which needs a separate adapter written for it.

2. with SQL, reliable queuing is "easy", but knowing about enqueued items (with as minimal an overhead as possible, scanning as few additional items as possible) is hard. SQL says nothing about how to get push-notification-to-the-client type information out of the DB, and so every DBMS implements its own mechanism for this. These are pure optimizations using pure-ephemeral code, though; the schema of the queue in the DB remains the same either way, such that the data could be exported from one RDBMS to a .sql file, imported from that .sql file to a different RDBMS, and the only thing needed on the client side would be switching adapters.

(In theory, presuming a common wire-protocol abstraction interface lib like JDBC, you could have a "standard SQL" adapter. It'd just be annoyingly low-throughput. Like using HTTP polling when you could be using WebSockets.)

>>chx+ne1
>They allow overbooking the flight but you can't book the same seat twice.

ok, but how do you overbook a flight without booking at least one of the seats twice?

replies(1): >>Jailbi+lJ1

>>nickjj+Tg
Nowadays Rails has good_job[0], which lets you stick with Postgres for background jobs until you need more than a million-ish a day.

[0] https://github.com/bensheldon/good_job

>>maniga+QM
Thanks, I had no idea!

>>taffer+Zn1
Do you have access to your production database?

How do you write tests? With what data are you testing?

How do you handle backwards compatibility? (For the code? For the data?)

Do you do this upgrade during deployment? Can you do blue/green deployments? What about canary deployments?

Is this transaction where you are dropping the old and creating the new part of the code or part of the database? (the actual code that does this)

How do you profile the performance of the said code?

replies(2): >>e_prox+Um2 >>taffer+lq2

>>kyleee+DP
Reminds me of Starbucks does not use two phase commit. https://news.ycombinator.com/item?id=6229004

>>crazyg+iU
I believe most people simply have no concept of how performant modern RDBMS are. "Joins are slow" and all that.

>>atombe+2j1
I think you'd still desire loose ordering even if you didn't care about strict ordering. If you always have 50 items in the queue you could be blowing the latency budget on random items because they are getting passed over where as with some form of loose chronological order you could still be completing everything within a certain amount.

SQS without FIFO works about like this AFAIK.

I agree with the gist of everything you are saying though, particularly the point about signaling. As soon as you want to start keeping track of a process that state is going to be stored somewhere outside the queues.

>>bryanr+AB1
You don't assign all the seats up front. Some are marked to be assigned at the gate.

replies(1): >>abhish+mH2

>>Jarwai+Tj1
> Do I want to separate the business entity "job" that maintains a status and updates and such from the "thing that dispatches jobs to workers"? And when a worker Has the job it just updates the business entity in the database?

If you ever really need to that's one way to do it but consider I'm imagining 50-100k state updates per second as being reasonable depending on how the data and queries are laid out. You can always use NOTIFY as discussed to cut down on the amount of worker queries for new work before having to completely push signaling to a separate system(which would likely still require occasional polling and/or a background process to re-signal work that hasn't been picked up).

It's interesting to consider that AWS Step Functions charges(or did charge) by the state transition.

>>skytre+i21
We needed queues. We used Redis. That fits the domain.

Problem was there wasn't a good answer to "How much redis does your team need to know to put it in production".

We thought we knew it well enough, we thought we knew what we were getting into, and we thought so many others are using it for this, we should be good. That is makes a difference, clearly.

>>conrad+La1
Marko Kreen? He still maintains it so I guess some people still use it and the associated logical table distribution system.

>>dereth+dU
IMO one of the reasons that works for Elixir is that Elixir itself is built for all sorts of concurrent workloads.

In most other languages, you’re sending everything to the queue. With Elixir you only need a small subset of background work to go to a queue, which is usually work that would stress the database.

>>spockz+D51
Yes a simple SCRM will do.

replies(1): >>abhish+wG2

>>skytre+vj
> Whatever happened to use the proper data structures for the job? PostgreSQL and MySQL are, at the end of the day, b-trees with indices. Throw in relational properties and/or ACID too. Those aren't properties you need or want in a queue structure.

How is relational storage engine with support for transactions, document/json, ACID, hot-standby "the wrong data model" for a queue?

>>hyzyla+(OP)
Remember when Circleci kept going down due to weirdness with their queued jobs in their database?

Read here https://status.circleci.com/incidents/8rklh3qqckp1

>>politi+lX
If Redis is your primary datastore then sure. But the problem domain for which "Redis is your primary datastore" is appropriate is relatively narrow.

>>crazyg+iU
I very strongly agree to the idea of using the tools you already have and know to solve the problem at hand and release it. Then observe where it could use help, then seek options on specific products solving those problems and only then can we get to real good questions that can pierce the veil of marketing and the comfort of herd mentality.

>>skytre+Ak1
> I still advise using Redis as a reasonable hedge against someone famous tweeting

Early stage startups die because of lack of PMF. Diverting focus and resources away from finding PMF kills companies. Most companies should focus on the product, tech debt be damned.

>>blowsk+gr
This. So much this.

>>hughrr+gc
One of my businesses is running a software solution build in 2012. Ever since it is using queuing via a database. It's simple, easy to maintain. And scales really well. We are talking about 10k queues per second. Yes it disattracts developers like you. They usually prefer the latest fancy trending tool. luckily those types are often responsible for overengineering things and overly complex solutions.

>>cerved+G61
> A process doesn't need a non-trivial flow to need a queue and workers.

Right, it doesn't have to be. Blockchain tech such as Bitcoin is effectively this sort of thing. The idealized type of queue worker would be a pure, referential transparent function that crunches some data and returns an output. More difficult to write but still acceptable: idempotent workers.

> the worker is assumed to be stalled and a new child request is created

This may work for your use case. I'd be worried about an infinite process loop here. Where a problematic job continually stalls a child process and gets entered back into the queue forever, because there needs to be logic to raise an exception and say: "hey, take a look at this job... something may be wrong with it" and then put that job in a holding area. This is also why I'm a big fan of Erlang, which has all of this batteries included with supervisors (they really were 20+ years ahead of their time on this stuff).

> I think there's huge difference in the needs of different implementations of queues

Yes, and people often aren't sure what their needs are. I've worked with RabbitMQ on a system that almost certainly didn't need it. And I've worked on systems that were nothing more than a file that lands via FTP on a folder and is moved (atomically, via /bin/mv) to indicate the job is pending. Using a file system as a queue is another interesting take on this sort of thing. No database even needed.

replies(1): >>cerved+qu3

>>yawara+Wo1
Correct

replies(1): >>mikekl+bH6

>>hughrr+GA
Not really knowing the details of Redis. What makes it a better key-value store than a Postgres table with a key-column and a value-column?

>>TheCoe+RP
You do not need a relational database for a simple chat/forum application.

>>obstac+vj1
You are correct. I wrote the original comment as "single server" so I assume it does not mean a meaningful scale and can be more effectively dealt with a support ticket. Not everything needs to be growth trajectory SaaS.

>>sa46+Di1
> This approach doesn’t work for LISTEN because typically a client will listen for its entire lifecycle so you can’t share connections in a pool.

But you only need one connection for LISTEN per database, total. So I'm confused why this is made out to be a big issue.

>>hyzyla+(OP)
How does this compare in terms of performance since Redis keeps everything in memory while Postgres doesn't as far as I know?

>>rantwa+PF1
You have to solve these things with a job queue system written in any other language or using any other databases as backends too though.

replies(1): >>rantwa+1O3

>>hyzyla+(OP)
A great job queue for PostgreSQL running on Node.js https://github.com/graphile/worker

I've been very happy with it.

>>rantwa+PF1
Have an automated process for deployment and a small number of people who have the keys to production deployment. Don't let developers manually change things in the production system or any system at all.

You can use pgtap to write automated unit tests. For profiling, there are explain, autoexplain, and plprofiler.

Blue/Green and Canary deployments are not currently possible with Postgres. On the other hand, Postres has transactional DDL, which means there is no downtime during code deployment and automatic rollback if something goes wrong.

The database is only for data and for stored procedures or functions, which are stored in separate schemas. Your deployment scripts, migration scripts, test scripts and everything else is not stored in the database, but in your source control system, e.g. Git.

For everything else, just use conventional software engineering practices. There is no reason to treat SQL code differently than Ruby or Java code.

replies(1): >>rantwa+XN3

>>sa46+Mv1
> [...] since network latency tends to dominate OLTP requests.

You are absolutely right about that. However, in this case we are talking about background workers. I would argue that background workers, such as an image resizing worker, are typically CPU-bound and do not perform high-volume OLTP queries. Therefore, a background worker does not require multiple database connections per thread as a web server would.

>>kccqzy+De1
What exactly is the right answer here?

>>agf+7K
So what do you propose here? I see that stackoverflow mentions some preemptive locking for a seat in cache for a few minutes. During that time the seat is taken out of the pool for potential booking. If the time lapses without the seat getting booked then the seat returns to the pool. This avoids going to the DB and see who wins.

>>mathfa+AQ1
What is SCRM?

>>bryanr+6K
Well, sure. But what if you want to avoid such a scenario? Two people booking vaccine slots shouldn’t be allowed the exact same slot. They are going to be pissed realising one of them reached the center but didn’t get the vaccine.

>>kyleee+DP
Wouldn’t the attendant be in trouble if two people at the gate start fighting for the same window seat as printed on their ticket?

>>Jailbi+lJ1
Well then the problem moves a little to the left. If there are x seats to be booked up front and y seats to be allocated later. How many times can each of the x seats be booked and how do you prevent that n and n+1 over the limit booking for some seat? The problem is still there.

replies(1): >>chx+nN6

>>des1nd+Ag
I wouldn't personally use Postgres for all of these, but have done so successfully multiple times for a decent subset:

- storing relational data (duh)

- storing JSON documents with Postgres' JSONB support - it really is very good, and being able to query relational and document data in the same query is wonderful

- storing key/value type data, where I only need to store a small amount of such data - seems silly to spin up Redis for such a small requirement

- time-series data - TimescaleDB is amazing. Performance may not be on par with a highly tuned schema with a purpose-built time series database, but it's still very, very good. It's fast even with billions of rows, has data compression, and it's really nice to be able to be able to query it just like any other Postgres tables. And the TimescaleDB folks are really helpful on Slack and GitHub. I'm a huge fan of TimescaleDB, and think it's more than adequate for a lot of time-series use cases

- full text search - Postgres shines here too! It's not as powerful as the likes of Elasticsearch, but it's still very good, and very fast. And Elasticsearch is not trivial or cheap to setup and maintain

For queues and pub/sub, RabbitMQ is my go-to solution.

>>mianos+tm
I've used RabbitMQ on several projects over the last decade or more, and have never once known it to fail.

It can be tricky to configure it you want clustering, or have fancy requirements (e.g. client certificate authentication), but once running, it's solid.

>>taf2+8E
This is most definitely correct .. until you want to scale beyond a few gigabytes of memory and then with a distributed cluster many synchronization issues don’t seem that trivial anymore.

>>abhish+2G
The "what actually happens" answer is often "let them both book successfully and then rebook the person with lower loyalty status".

>>deckar+y92
> I'd be worried about an infinite process loop here.

I mitigate this in several ways, first, only one worker can work on a job or child job. At worst, one worker is stuck in a loop. There's also a maximum limit of retries. So a job is never infinitely retried. Increasing the amount of stall time per request would be trivial but it's never become an issue. Monitoring potential infinite jobs is as simple as querying for jobs that reach the limit.

I'm sure Erlang or other actor model type implementations would handle this incredibly well at scale but it seems to me that just doing in the database works good enough for at least certain workloads.

As always, it comes down to a question of the right tool for the job. The advantage of the database is that it's mostly likely already in the stack. I don't doubt that there are situations where it's the wrong tool. That's why it's interesting to know at which point it's not a good solution, so that one can make a good decision.

>>taffer+lq2
this reminds me about a show I saw at some point where someone had put a jet engine on a motorcycle. super dangerous but super fast. asked why they did this, the answer was: "because we can!!!"

seriously, you have the actual data, you have the store procedures and you have the code. Each has their own version. I have never seen this work in a production environment. It's possible that things evolved and there is better tooling since I last tried performing this stunt.

If it works for you that's great and maybe you should write some sort of blog post (do people still blog?) describing the setup and allowing others to either 1) replicate and use it 2) poke holes in it.

replies(2): >>taffer+3I4 >>taffer+yR4

>>e_prox+Um2
no you don't. the queue system gives you a bunch of primitives and you build on top of them.

>>queich+691
We are using it quite lightly at the moment, only a few jobs per second, and with the worker running in the same thread as the publisher. The scheduled jobs feature is nice too, replacing the need for a recurring crib job. So far our experiences have been positive!

replies(1): >>queich+zY4

>>rantwa+XN3
> seriously, you have the actual data, you have the store procedures and you have the code. Each has their own version. I have never seen this work in a production environment.

How do you handle this in other languages?

replies(1): >>rantwa+H16

>>rantwa+XN3
> this reminds me about a show I saw at some point where someone had put a jet engine on a motorcycle. super dangerous but super fast. asked why they did this, the answer was: "because we can!!!"

How does this nonsense add anything to the discussion?

>>fiznoo+ID4
Thanks!

>>Dangit+uN
I see now! So the 'expired' field is for checking died/incompleted workers.

>>taffer+3I4
yo don't have stored procedures. everything is done in the code. so you only have 2 things to worry about (the db schema and the code). you still have the problem of the schema versioning but you only have to worry about 2 things

replies(1): >>taffer+tO6

>>sigil+gp1
1. I interpreted expires to be the timeout. Their could either be a stand alone transaction run periodically, that sets the expiration back to zero, or you just update the queries to read `expire < now()`.

Thus neither are running in the same transaction as anything else. The SELECT will never block, since it is not specified as "FOR update". The update itself specified the key.

On the other hand, the SKIP LOCKED approach is really nice for allowing the queue processing to be one single transaction that commits atomically, and thus nicer semantics than a timeout based retry strategy. I do like that approach. The only downside is that not all RDBMS support those semantics, but for an app that only needs to support Postgres it does look like the smart approach.

>>rantwa+Hc1
The blog post you mentions brings two arguments: first the database would require looking. Second the database would need tradeoffs between reading (polling) and writing (adding and removing to the queue).

The original article handles the first argument: Postgres doesn't need polling. Instead it provides a notify mechanism, that informs the application when the table changed (something was added or removed from the queue) via the SQL NOTIFY statement.

For the second point it also provides a solution: since optimization for reading is not needed anymore with the NOTIFY statement, the trade-off like different: we now need an efficient way to write. For this the article provides an efficient update statement with special lock behavior. This helps to make writes efficient, too.

It looks like both points from the blog post you linked are handled in the original article.

>>hughrr+Ua2
Perhaps I misinterpreted this part?

> I struggle to consider using it as a data store these days as well with other options on the table.

It sounds like you wouldn't use it for any purposes.

>>abhish+mH2
Airlines have been data mining for a very, very long time now and have a reasonably guess of the ratio of people booking vs showing up at the gate to fly. So you set your buckets to oversell by that much and assign seats later. If more people show up than expected than you throw them a voucher to go away. Hopefully enough will.

Sometimes it doesn't work out and that's when airlines bleed. I was flying home one day from some US city (was it Denver? the years and cities blend together) when a Drupal conference and some major sports thing ended the same day and the airport was a total gong show. United first offered a paltry 120 USD but when they raised to 400 USD to go home the next day instead my hand went up -- and I was such a greenhorn I didn't know they will pay for the hotel, too. A buddy pocketed no less than 1200 USD from Delta.

replies(1): >>abhish+Sj8

>>rantwa+H16
Now you have your answer: Just instead of deploying to an app server you deploy to your db server. Yet for some reason people forget everything they have learned once the term SQL comes up.

>>aidos+7s
If you have the case, if A then do B and C. You can commit to a topic and let B and C be triggered separately. This does not solve cases like storing to a db and then send a db-updated event.

>>chx+nN6
This looks like one solution but I don’t think you want this design in medicine slot booking or stock selling. You want it to be much more definite if there is no way for a conflict to be resolved. This solution won’t work there.

>>mixmas+ed1
Some kind of message bus implementation (RabbitMQ etc...) should be a must-have even in an on-premise environment. Queues are a very useful tool in almost every conceivable non-trivial software application.

zlacker

Do you really need Redis? How to get away with just PostgreSQL