zlacker

This feels like it's all priced for AI companies, TBH. This per-request pricing makes a LOT more sense if you assume that one particular piece of content will only be requested once in your company's history, saved on a server somewhere and used for training forever. You're not paying for a request being processed, you're not even paying to offset any advertising cost, you're essentially paying for the ability to use the requested piece of content forever. Maybe that's what Apollo should do, set up a huge cache layer and proxy all requests for the public data through there? I feel like the power law would apply here, so 80% of the requests would be for 20% of the content. Considering how popular the most popular subreddits are, I wouldn't be surprised if the balance was something like 99-to-1. Cache misses would still need to be fetched from the API itself, but that should drive costs down massively.

If the ToS allow this, the cache layer could even be shared across apps from different developers (developers supporting both iOS and Android might have an advantage here), making the costs even lower.

replies(3): >>paragr+Fp >>gsk22+CB >>DrammB+aN

>>miki12+(OP)
I feel like the better pricing strategy would be something similar to what the geospatial API platforms like Google Maps do, with their explicit no-caching or time-limited-caching clauses. E.g. you're actually prohibited from retaining results from say a geocode beyond 30 days IIRC.

Amazon made this explicit with their Geospatial API pricing ( https://aws.amazon.com/location/pricing/ - "Places" tab) - where the pricing for being able to store a result is 8x higher.

replies(1): >>miki12+Uz

>>paragr+Fp
This really doesn't work in the context of AI training though. Sure, it would make reuse between models a lot harder, perhaps, but the general idea still holds, once a model is trained, the model is for forever.

>>miki12+(OP)
Would you use a client where the most popular threads are constantly out of date due to caching? I certainly wouldn't.

replies(1): >>noitpm+ZE

>>gsk22+CB
This wouldn't really be an issue for there to still be a massive advantage to even a simple caching layer.

Imagine one has 10 request for thread {X} every second (probably a massive under estimation of the actual traffic). If you cache that single thread with a lifetime of a second you have instantly cut out 90% of your API usage for that thread.

Obviously the final benefit depends on what the actual distribution of {users} per {threads} per {time} -- but if your goal is to shave redundant API requests than it definitely makes sense, especially if the alternative is untenable in terms of cost.

replies(1): >>gsk22+oP

>>miki12+(OP)
As soon as there's anything mirroring reddit and allowing 3 party apps to circumvent API pricing then the ToS would be updated to disallow it.

>>noitpm+ZE
My point is that for caching to be cost-effective, it requires long lifetimes -- especially if your # of users is relatively low.

Especially for an app like Reddit with millions of subreddits. There is no monolithic "reddit"; the experience is tailored to each user based on the subs they've joined. So your % of requests that will be asking for a cached resource is lower than other high-volume websites. I think your 99-to-1 estimate is _way_ off.