zlacker

[parent] [thread] 3 comments
1. leephi+(OP)[view] [source] 2021-04-15 15:10:55
If they are cloaking, can we get around the paywall by using the Google crawler user agent string?
replies(2): >>tyingq+m1 >>gpm+W7
2. tyingq+m1[view] [source] 2021-04-15 15:18:32
>>leephi+(OP)
That is one of many workarounds the various paywall-buster browser extensions use. Either setting to the Google crawler user-agent, or the Google AdBot agent. I would guess you would need to not send cookies also. They could also be clever and check that your IP/Netblock is a Google owned one.
3. gpm+W7[view] [source] 2021-04-15 15:46:59
>>leephi+(OP)
Huh, I thought they published a range of IP addresses they used to prevent this, but apparently they don't use an entirely consistent one and you need to do a dns request [1] to actually check if something is google's crawler. I'm willing to bet most organizations aren't doing that... so maybe.

[1] https://developers.google.com/search/docs/advanced/crawling/...

replies(1): >>leephi+aE
◧◩
4. leephi+aE[view] [source] [discussion] 2021-04-15 17:55:28
>>gpm+W7
I just installed a user agent switcher and tried it on a prominent financial news site. The offer to subscribe was replaced by the article when I reloaded using the Googlebot user agent.
[go to top]