zlacker

[parent] [thread] 0 comments
1. Animat+(OP)[view] [source] 2023-07-09 05:34:31
> There are problems with robots.txt if you actually try to implement it for a crawler.

Yes, although that's not what people are usually worried about.

I once tried to deal with that in Sitetruth's crawler. There are redirects at the HTTP level, redirects at the HTML level, and the HTTP->HTTPS thing. Resolving all that honestly is annoying, but possible. Sometimes you do need to look at the beginning of a file blocked by "robots.txt" to find that it is redirecting you elsewhere. It's like a door that says both "Keep Out" and "Please Use Other Door".

This is more of a pedantic problem than a real one.

[go to top]