Enterprise deals between these user generated content platforms and LLM platforms may well involve many billions of API requests, and the pricing is likely an order of magnitude less expensive per call due to the volume. The result is a cost-per-call that is cost-prohibitive at smaller scales, and undoubtedly the UGC platform operators are aware that they're pricing out third-party applications like Apollo and Pushshift. These operators need high baseline pricing so they can discount in negotiation with LLM clients.
Or, perhaps, it's the opposite: for instance, Reddit could be developing its own first-party language model, and any other model with access to semi-realtime data is a potentially existential competitor. The best strategic route is to make it economically infeasible for some hypothetical competitor to arise, while still generating revenue from clients willing to pay these much higher rates.
Ultimately, this seems to be playing out as the endgame of the open internet v. corporate consolidation, and while it's unclear who's winning, I think it's pretty obvious that most of us are losing.
The web is in the process of rapidly filling up with AI regurgitated garbage, eventually there's going to be a handful of sites with real usable content on them left, reddit being one of the biggest.
This is already the case. See the oceans of crap SEO optimized "food recipe sites". It's unbearable.
So sad that, BBC back in 199ps and 2000s, there were so many random sites to visit with interesting things. Search engines were of actual use.
Now, it's basically facebook, reddit, pinterest, instagram, stackoverflow , and a couple of counted others, depending on what you like. And EVERYTHING is monetized.
The WWW of today is terrible.
Now