River: A fast, robust job queue for Go and Postgres

>>bo0tzz+(OP)
If I was going to do my own Job Queue, I'd implement it more like the GCP Tasks [0].

It is such a better model for the majority of queues. All you're doing is storing a message, hitting an HTTP endpoint and deleting the message on success. This makes it so much easier to scale, reason, and test task execution.

Update: since multiple people seem confused. I'm talking about the implementation of a job queue system, not suggesting that they use the GCP tasks product. That said, I would have just used GCP tasks too (assuming the usecase dictated it, fantastic and rock solid product.)

[0] https://cloud.google.com/tasks

>>bo0tzz+(OP)
Looks great. For people wondering about wether postgres really is a good choice for a job queue I can recommend checking out Oban in Elixir that has been running in production for many years: https://github.com/sorentwo/oban

Benchmark: peaks at around 17,699 jobs/sec for one queue on one node. Probably covers most apps.

https://getoban.pro/articles/one-million-jobs-a-minute-with-...

>>latchk+sg
Do you know that brandur's been writing about Postgres job queues since at least 2017? Cut him some slack.

https://brandur.org/job-drain

>>15294722

>>politi+si
"I'm into effective altruism and created the largest crypto exchange in the world. Cut me some slack."

No, we don't operate like that. Call me out when I'm wrong technically, but don't tell me that because someone is some sort of celebrity that I should cut them some slack.

Everything he pointed out is literally covered in the GCP Tasks documentation.

https://cloud.google.com/tasks/docs/dual-overview

https://cloud.google.com/tasks/docs/common-pitfalls

>>bo0tzz+(OP)
Hi HN, I'm one of the authors of River along with Brandur. We've been working on this library for a few months and thought it was about time we get it out into the world.

Transactional job queues have been a recurring theme throughout my career as a backend and distributed systems engineer at Heroku, Opendoor, and Mux. Despite the problems with non-transactional queues being well understood I keep encountering these same problems. I wrote a bit about them here in our docs: https://riverqueue.com/docs/transactional-enqueueing

Ultimately I want to help engineers be able to focus their time on building a reliable product, not chasing down distributed systems edge cases. I think most people underestimate just how far you can get with this model—most systems will never outgrow the scaling constraints and the rest are generally better off not worrying about these problems until they truly need to.

Please check out the website and docs for more info. We have a lot more coming but first we want to iron out the API design with the community and get some feedback on what features people are most excited for. https://riverqueue.com/

>>victor+Zh
Oban is fantastic and has been a huge source of inspiration for us, showing what is possible in this space. In fact I think during my time at Distru we were one of Parker's first customers with Oban Web / Pro :)

We've also had a lot of experience with with other libraries like Que ( https://github.com/que-rb/que ) and Sidekiq (https://sidekiq.org/) which have certainly influenced us over the years.

>>politi+si
2015, even :) https://brandur.org/postgres-queues

>>bo0tzz+(OP)
I love PG job queues!

They’re surprisingly easy to implement in plain SQL:

[1] https://taylor.town/pg-task

The nice thing about this implementation is that you can query within the same transaction window

>>bo0tzz+(OP)
Nice, I've been using graphile-worker [0] for a while now, and it handles our needs perfectly, so I can totally see why you want something in the go world.

Just skimming the docs, can you add a job directly via the DB? So a native trigger could add a job in? Or does it have to go via a client?

[0] https://worker.graphile.org/

>>surpri+ar
Agreed. Shortwave [1] is built completely on this, but with the added layer of having a leasing system that is per user on top of the tasks. So you only need to `SKIP LOCKED` to grab a lease, then you can grab as many tasks as you want and process them in bulk. It allows higher throughput of tasks, and also was required for the use case as the leases where tied to a user and tasks for a single user must be processed in order.

[1]: https://www.shortwave.com/

>>bo0tzz+(OP)
Looks cool and thanks for sharing. Founder of windmill.dev, an open-source, extremely fast workflow engine to run jobs in ts,py,gosh whose most important piece, the queue, is also just rust + postgresql (and mostly the FOR UPDATE SKIP LOCKED).

I'd be curious to compare performances once you guys are comfortable with that, we do them openly and everyday on: https://github.com/windmill-labs/windmill/tree/benchmarks

I wasn't aware of the skip B-tree splits and the REINDEX CONCURRENTLY tricks. But curious what do you index in your jobs that use those. We mostly rely on the tag/queue_name (which has a small cardinality), scheduled_for, and running boolean which don't seem good fit for b-trees.

>>bo0tzz+(OP)
If you are on Kafka already, there is an alternative to schedule a job without PG [0]

[0] https://www.wgtwo.com/blog/kafka-timers/

>>bo0tzz+(OP)
We are looking right now to use a stable PG job queue built in Go. We have found 2 already existing ones:

* neoq: https://github.com/acaloiaro/neoq

* gue: https://github.com/vgarvardt/gue

Neoq is new and we found it to have some features (like scheduling tasks) that were attractive. The maintainer has also been responsive to fixing our bug reports and addressing our concerns as we try it out.

Gue has been around for a while and is probably serving its users well.

Looking forward to trying out River now. I do wonder if neoq and river might be better off joining forces.

>>bo0tzz+(OP)
> Work in a transaction has other benefits too. Postgres’ NOTIFY respects transactions, so the moment a job is ready to work a job queue can wake a worker to work it, bringing the mean delay before work happens down to the sub-millisecond level.

Oban just went the opposite way, removing the use of database triggers for insert notifications and moving them into the application layer instead[1]. The prevalence of poolers like pgbouncer, which prevent NOTIFY ever triggering, and the extra db load of trigger handling wasn't worth it.

[1]: https://github.com/sorentwo/oban/commit/7688651446a76d766f39...

>>bo0tzz+(OP)
This looks like a great effort and I am looking forward to trying it out.

I am a bit confused by the choice of the LGPL 3.0 license. It requires one to dynamically link the library to avoid GPL's virality, but in a language like Go that statically links everything, it becomes impossible to satisfy the requirements of the license, unless we ignore what it says and focus just on its spirit. I see that was discussed previously by the community in posts such as these [1][2][3]

I am assuming that bgentry and brandur have strong thoughts on the topic since they avoided the default Go license choice of BSD/MIT, so I'd love to hear more.

[1] https://www.makeworld.space/2021/01/lgpl_go.html [2] https://golang-nuts.narkive.com/41XkIlzJ/go-lgpl-and-static-... [3] https://softwareengineering.stackexchange.com/questions/1790...

>>bgentr+9l
How does this compare to https://github.com/vgarvardt/gue?

>>bo0tzz+(OP)
The number of features lifted directly from Oban[1] is astounding, considering there isn't any attribution in the announcement post or the repo.

Starting with the project's tagline, "Robust job processing in Elixir", let's see what else:

  - The same job states, including the British spelling for `cancelled`
  - Snoozing and cancelling jobs inline
  - The prioritization system
  - Tracking where jobs were attempted in an attempted_by column
  - Storing a list of errors inline on the job
  - The same check constraints and the same compound indexes
  - Almost the entire table schema, really
  - Unique jobs with the exact same option names
  - Table-backed leadership election

Please give some credit where it's due.

[1]: https://github.com/sorentwo/oban

>>endorp+fW
Or neoq. >>38352778

>>JoshGl+cA
You've found an underdocumented feature, but in fact River does already do what you're asking for! Check out `ScheduledAt` on the `InsertOpts`: https://pkg.go.dev/github.com/riverqueue/river#InsertOpts

I'll try to work this into the higher level docs website later today with an example :)

>>endorp+fW
Not familiar with either project, but it seems gue is a fork of the authors previous project, https://github.com/bgentry/que-go

>>bgentr+LJ
If you could build a UI similar to Hangire [0] or Laravel Horizon [1], that would be awesome.

[0] https://hangfire.io

[1] https://github.com/laravel/horizon

>>bo0tzz+(OP)
I wrote our own little Go and Postgres job queue similar in spirit. Some tricks we used:

- Use FOR NO KEY UPDATE instead of FOR UPDATE so you don't block inserts into tables with a foreign key relationship with the job table. [1]

- We parallelize worker by tenant_id but process a single tenant sequentially. I didn't see anything in the docs about that use case; might be worth some design time.

[1]: https://www.migops.com/blog/select-for-update-and-its-behavi...

>>bo0tzz+(OP)
Awesome! Seems like this would be a lot easier to work with and perhaps more performant than Skye's pg-queue? Queue workload is a lot like OLTP, which IMO, makes Postgres great for it (but does require some extra tuning).

Unlike https://github.com/tembo-io/pgmq a project we've been working on at Tembo, many queue projects still require you to run and manage a process external to the database, like a background worker. Or they ship as a client library and live in your application, which will limit the languages you can chose to work with. PGMQ is a pure SQL API, so any language that can connect to Postgres can use it.

>>bojanz+NV
Hi bojanz, to be honest we were not well informed enough on the licensing nuances. I appreciate you sharing these links, please tune into this GitHub issue where we'll give updates soon and make sure any ambiguity is resolved. https://github.com/riverqueue/river/issues/47

>>Thaxll+nZ1
I’d argue strongly that Oban did invent things, including parts of the underlying structure used in River, and the authors agree that it was a heavy influence.

While there is no overlap in technology or structure with Sidekiq, the original Oban announcement on the ElixirForum mentions it along with all of the direct influences:

https://elixirforum.com/t/oban-reliable-and-observable-job-p...

>>bo0tzz+(OP)
Interesting, I would have though a solution like https://temporal.io/ would be more appropriate for these use cases.

a job queue might just be the tip of the use cases iceberg... isn't it?

in the end it's a pub/sub - I use nats.io workers for this.

arf, just read a few comments on this same line down bellow.

>>bo0tzz+(OP)
Nice problem observation.

One solution is the outbox pattern:

https://microservices.io/patterns/data/transactional-outbox....

zlacker

River: A fast, robust job queue for Go and Postgres