zlacker

[return to "Terraria on Stadia cancelled after developer's Google account gets locked"]
1. rochak+W1[view] [source] 2021-02-08 08:31:25
>>benhur+(OP)
Is there really no way for a user to get in touch with a human agent? I read that Google automates the flagging and disabling of accounts but given how many people have their livelihood linked to these accounts, Google must have done something. It makes me scared how deep I have dived into the Google ecosystem. Time and time again I think about transitioning to someplace else but don’t know how to. It seems too daunting.
◧◩
2. zxcvbn+N5[view] [source] 2021-02-08 09:11:54
>>rochak+W1
Google only takes calls for ad sales and gsuite support as far as I know. Beyond that shaming them on social media is the only way to get their attention. I used to work for a top five web site and even we couldn’t get ahold of anyone - one day Google decided to start crawling us at a rate of 120k rps and it was killing the site by pulling ancient content that was 100% cache miss. No way for us to get in touch with Google officially, our billionaire CEO hadn’t traded numbers with their billionaire CEO so no help there, one of the developers had a college buddy that landed at Google and that guy was able to use some sort of internal mailing list to get them to drop the crawl rate down to 20k rps.

(Microsoft is just as bad - their sales people can’t be bothered to talk to anyone who isn’t a partner, but that worked out great for me, I wasn’t really feeling azure and it made a great excuse to not consider them. One of their sales people did leave me a VM three or four months later but we had already chosen another vendor by then).

◧◩◪
3. Smerit+nc[view] [source] 2021-02-08 10:11:21
>>zxcvbn+N5
In the past I had written about my experiences with crawling[1], from accidentally getting banned by Slashdot as a teenager doing linguistic analysis to accidentally DoS'ing a major website to being threatened with lawsuits.

The latter parts of the story were when I was part of Common Crawl, a public good dataset that has seen a great deal of use. During my tenure there I crawled over 2.5 petabytes and 35 billion webpages mostly by myself.

I'd always felt guilty of a specific case as our crawler hit a big name web company (top N web company) with up to 3000 requests per second* and they sent a lovely note that began with how much they loved the dataset but ended with "please stop thrashing our cache or we'll need to ban your crawler". It was difficult to properly fix due to limited engineering resources and as they represented many tens / hundreds of thousands of domains, with some of the domains essentially proxying requests back to them.

Knowing Google hammered you at 120k requests per second down to _only_ 20k per second has assuaged some portion of that guilt.

[1]: https://state.smerity.com/smerity/state/01EAN3YGGXN93GFRM8XW...

* Up to 3000 requests per second as it'd spike once every half hour or hour when parallelizing across a new set of URL seeds but would then decrease, with the crawl not active for all the month

◧◩◪◨
4. zxcvbn+Dg1[view] [source] 2021-02-08 16:41:34
>>Smerit+nc
With some planning we could have accommodated the 120K rps rate and more, but just out of the blue it caused a lot of issues, the database shards for historic information tended to be configured for infrequent access to large amounts of historic data, their access completely thrashed our caches, etc. We did want Google to index us, if there had been an open dialog we could have created a separate path for their traffic that bypassed the cache and we could have brought additional database servers into production to handle the increased load, we even had a real time events feed that updated whenever content was created or updated that we would have given Google free access to that so they could just crawl the changes instead of having to scan the site for updates, but since they would not talk to anyone none of that happened.
[go to top]