zlacker

How do you know the postgres implementation is naive? I've worked on several analytics platforms...including offshoots of google analytics within Google itself, and this problem domain is ridiculously easy to shard on natural partitions. And after sharding, you can start to do roll-ups, which Google Analytics does internally.

By 2014 when I left, we had a few petabytes of analytics data for a very small but high traffic set of customers. Could we query all of that at once within a reasonable online SLA? No. We partitioned and sharded the data easily and only queried the partitions we needed.

If I were to do this now and didn't need near real-time (what is real-time?) I'd use sqlite. Otherwise I'ld use trickle-n-flip on postgres or mysql. There are literally 10+ year-old books[1] on this wrt RDBMS.

And yes, even with 2000 clients reaching billions of requests per day, only the top few stressed the system. The rest is long tail.

1. https://www.amazon.com/Data-Warehousing-Handbook-Rob-Mattiso...

replies(1): >>curun1+ik1

>>epicmu+(OP)
There's a comment elsewhere in this thread where he talks about his backend. He didn't explicitly say it was naive, but he definitely gave off that vibe. Is it possible to use postgres in a sophisticated way to work as an analytics store? Sure...Timescale does it and gives you the majority of what you'd need. But it's hard to get right and the creator hasn't given the impression that he's well-versed in this space.