(Microsoft is just as bad - their sales people can’t be bothered to talk to anyone who isn’t a partner, but that worked out great for me, I wasn’t really feeling azure and it made a great excuse to not consider them. One of their sales people did leave me a VM three or four months later but we had already chosen another vendor by then).
The latter parts of the story were when I was part of Common Crawl, a public good dataset that has seen a great deal of use. During my tenure there I crawled over 2.5 petabytes and 35 billion webpages mostly by myself.
I'd always felt guilty of a specific case as our crawler hit a big name web company (top N web company) with up to 3000 requests per second* and they sent a lovely note that began with how much they loved the dataset but ended with "please stop thrashing our cache or we'll need to ban your crawler". It was difficult to properly fix due to limited engineering resources and as they represented many tens / hundreds of thousands of domains, with some of the domains essentially proxying requests back to them.
Knowing Google hammered you at 120k requests per second down to _only_ 20k per second has assuaged some portion of that guilt.
[1]: https://state.smerity.com/smerity/state/01EAN3YGGXN93GFRM8XW...
* Up to 3000 requests per second as it'd spike once every half hour or hour when parallelizing across a new set of URL seeds but would then decrease, with the crawl not active for all the month