zlacker

[parent] [thread] 5 comments
1. baby-y+(OP)[view] [source] 2021-04-15 14:22:36
could not agree with this more and this has gone on far too long.

why does google allow this? as you say it is 100% cloaking to have the entire article indexed but not present it in the subsequent page.

Sure, publishers feel they need paywalls for revenue purposes; have at it. That should not absolve them from the "rules" everyone else has to follow.

Cloaking refers to the practice of presenting different content or URLs to human users and search engines. Cloaking is considered a violation of Google's Webmaster Guidelines because it provides our users with different results than they expected. [0]

[0] - https://developers.google.com/search/docs/advanced/guideline...

replies(2): >>leephi+89 >>Apollo+aw
2. leephi+89[view] [source] 2021-04-15 15:10:55
>>baby-y+(OP)
If they are cloaking, can we get around the paywall by using the Google crawler user agent string?
replies(2): >>tyingq+ua >>gpm+4h
◧◩
3. tyingq+ua[view] [source] [discussion] 2021-04-15 15:18:32
>>leephi+89
That is one of many workarounds the various paywall-buster browser extensions use. Either setting to the Google crawler user-agent, or the Google AdBot agent. I would guess you would need to not send cookies also. They could also be clever and check that your IP/Netblock is a Google owned one.
◧◩
4. gpm+4h[view] [source] [discussion] 2021-04-15 15:46:59
>>leephi+89
Huh, I thought they published a range of IP addresses they used to prevent this, but apparently they don't use an entirely consistent one and you need to do a dns request [1] to actually check if something is google's crawler. I'm willing to bet most organizations aren't doing that... so maybe.

[1] https://developers.google.com/search/docs/advanced/crawling/...

replies(1): >>leephi+iN
5. Apollo+aw[view] [source] 2021-04-15 16:42:41
>>baby-y+(OP)
Pretty sure it's fear of it being added to anti trust complaint.

It is really frustrating as a user, and undoubtedly Google knows this. So an impending lawsuit is the only reason I can see for them not blocking nyt/other sites that do this.

◧◩◪
6. leephi+iN[view] [source] [discussion] 2021-04-15 17:55:28
>>gpm+4h
I just installed a user agent switcher and tried it on a prominent financial news site. The offer to subscribe was replaced by the article when I reloaded using the Googlebot user agent.
[go to top]