This sounds exactly like the problem tools like pgbouncer were designed to solve. If you're on AWS one could look at RDS Proxy.
> Most online resources chalk this up to connection churn, citing fork rates and the pid-per-backend yada, yada. This is all true but in my opinion misses the forest from the trees. The real bottleneck is the single-threaded main loop in the postmaster. Every operation requiring postmaster involvement is pulling from a fixed pool, the size of a single CPU core. A rudimentary experiment shows that we can linearly increase connection throughput by adding additional postmasters on the same host.
We have a habbit of never scheduling long running processes at round hours. Usually because they tend to be busier.
https://hakibenita.com/sql-tricks-application-dba#dont-sched...
The conclusion lists pgbouncer as one of the solutions but it does not explain it clearly.
> Many pieces of wisdom in the engineering zeitgeist are well preached but poorly understood. Postgres connection pooling falls neatly into this category. In this expedition we found one of the underlying reasons that connection pooling is so widely deployed on postgres systems running at scale. [...] an artificial constraint that has warped the shape of the developer ecosystem (RDS Proxy, pgbouncer, pgcat, etc) around it.
The artificial constraint is the single core nature of postmaster.
Other points at the end of the article that can be improved:
> we can mechnically reason about a solution.
Mechanically as in letting an AI find a solution, or as in reasoning like a mechanic, or? Furthermore:
> * Implementing jitter in our fleet of EC2 instances reduced the peak connection rate
How? Did they wait a random amount of milliseconds before sending queries to the db?
> * Eliminating bursts of parallel queries from our API servers
How?
That said, proxies aren't perfect. https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html outlines some dangers of using them (particularly when you might need session-level variables). My understanding is that PgDog does more tracking that mitigates some of these issues, but some of these are fundamental to the model. They're not a drop-in component the way other "proxies" might be.
you can keep things synced across databases easily and keep it super duper simple.
As in it's fully characterized, so you can use only math and logic rather than relying on experience and guesswork.
Since Postgres is a mature project, this is a non-trivial effort. See the Postgres wiki for some context: https://wiki.postgresql.org/wiki/Multithreading
But, I'm hopeful that in 2-3 years from now, we'll see this bear fruition. The recent asynchronous read I/O improvements in Postgres 18 show that Postgres can evolve, one just needs to be patient, potentially help contribute, and find workarounds (connection pooling, in this case).
I'm actually surprised that it handled that many connections. The data implies that they have 4000 new connections/sec...but is it 4000 connections handled/sec?
This won't work :) echo will run as root but the redirection is still running as the unprivileged user. Needs to be run from a privileged shell or by doing something like sudo sh -c "echo $NUM_PAGES > /proc/sys/vm/nr_hugepages"
The point gets across, though, technicality notwithstanding.
echo $NUM_PAGES | sudo tee /proc/sys/vm/nr_hugepages
I've always found it odd that there isn't a standard command to write stdin to a file that doesn't also write it to stdout. Or that tee doesn't have an option to supress writing to stdout.> write stdin to a file that doesn't also write it to stdout
You mean like "dd of=/path/file" ?
And that is assuming you have a solution for things like balancing, and routing to the correct shard.
> The real bottleneck is the single-threaded main loop in the postmaster.
A single-threaded event loop can do a lot of stuff. Certainly handle 4000 tasks of some sort in under 10s. Just offhand it seems like it would be eminently possible to handle incoming connections on the scale they describe in a single-threaded event loop.
Clearly the existing postgres postmaster thread is a bottleneck as it is implemented today. But I'd be interested to go deeper into what it's doing that causes it to be unable to keep up with a fairly low workload vs. what is possible to do on a single thread/core.
If people are building things which actually require massive amounts of data stored in databases they should be able to charge accordingly.
My first thought was "why even use big databases, you have perfect workload to shard it between a bunch of instances and as a bonus any downtime would only affect smaller part of customers"
They are cheap if you tiny fraction of server use for $20/mo or have 50 engineers working on code
Yes, currently, putting PgBouncer in the middle helps handle massive sudden influxes of potentially short lived connections. That is indeed the correct current best practice. But I hardly fault the author for wishing that postmaster could be configured to run on multiple cores so that the additional complexity of running PgBouncer for this relatively simple use-case could be eliminated.
> Most meetings start on the hour, some on the half, but most on the full. It sounds obvious to say it aloud, but the implication of this has rippled through our entire media processing infrastructure.
When you can control when it happens, you can often jitter things. For instance the naive approach of rate limiting users down to quantized times (eg: the minute, the hour, etc.) leads to every client coming back at the same time. The solution there is to apply a stable jitter so different clients get different resets.
That pattern does not go all that well with meetings as they need to happen when they happen, which is going to be mostly the hour and 30 minutes etc. However often the lead up time to those meetings is quite long, so you can do the work needed that should happen on the hour, quite a bit ahead of time and then apply the changes in one large batch on the minute.
You have similar problems quite often with things like weekly update emails. At scale it can take you a lot of time to prepare all the updates, often more than 12 hours. But you don't want the mails to come in at different times of the day so you really need to get the reports prepared and then send them out when ready.
If you happen to have moreutils installed, you can do that with pee
echo $NUM_PAGES | sudo pee 'cat > /proc/sys/vm/nr_hugepages'It’s likely an actual CPU would’ve handled this load just fine.
[1] https://github.com/puppetlabs/puppet/blob/main/lib/puppet/pa...
Why use something portable when you can use cat pee?
They probably don't even need a database anyway for data that is likely write once, read many. You could store the JSON of the meeting in S3. It's not like people are going back in time and updating meeting records. It's more like a log file and logging systems and data structures should be enough here. You can then take that data and ingest it into a database later, or some kind of search system, vector database etc.
Database connections are designed this way on purpose, it's why connection pools exist. This design is suboptimal.
Fun quick anecdote: a friend of mine worked at an EA subsidiary when Sim City (2013) was released, to great disaster as the online stuff failed under load. Got shifted over to the game a day after release to firefight their server stuff. He was responsible for the most dramatic initial improvement when he discovered the servers weren't using connection pooling, and instead were opening a new connection on almost every single query, using up all the connections on the back end DB. EA's approach had been "you're programmers, you could build the back end", not accepting games devs accurately telling them it was a distinct skill set.
ya, right. just make up some reason not following the best practices
And probably `echo -o output-file` as well.
The problem of resource usage for many connections is real.
I know it's useful for other things, but it has become a fearful instinct at this point.
https://www.freedesktop.org/software/systemd/man/latest/syst...
But transactions aren't processing queries all the time. Often the application will do processing between sending queries to the database. During that time a transaction is open, but doesn't do any work on the database server.
I believe they're just referring to having several completely-independent postgres instances on the same host.
In other words: say that postgres is maxing out at 2000 conns/sec. If the bottleneck actually was fork rate on the host, then having 2 independent copies of postgres on a host wouldn't improve the total number of connections per second that could be handled: each instance would max out at ~1000 conns/sec, since they're competing for process-spawning. But in reality that isn't the case, indicating that the fork rate isn't the bottleneck.
What you describe makes sense, of course, but few can build it without it being drastically worse than abusing a database like postgres. It's a sad state of affairs.
there are many reasons to need something like pgbouncer.
1) not all workloads lend themselves to backend pools
2) not all backends can afford pooling
3) you don't always control your backend
4) you might have a lot of different backends connecting to the database. it's much simpler to just tell them to connect to pgbouncer.
5) you don't want to trust backend team to not bring postgresql down with many simultaneolus connections.
6) backend pools often misbehave
people don't wake up one day and just add another cog to their infrastructure randomly.