zlacker

[parent] [thread] 5 comments
1. always+(OP)[view] [source] 2023-05-10 13:08:25
Why not serve fake garbage indistinguishable from real content by a computer, like LLM output? Sending errors just incentivizes bot owners to fix the identifiable parts
replies(4): >>shaneb+24 >>twelve+J4 >>ape4+Q6 >>dspill+I9
2. shaneb+24[view] [source] 2023-05-10 13:28:25
>>always+(OP)
"Why not serve fake garbage indistinguishable from real content by a computer, like LLM output?"

Serving more than the minimum wastes resources. Worse yet, a better solution would cost my time.

"Sending errors just incentivizes bot owners to fix the identifiable parts"

Sure, someone could make or configure their scraper perfectly. "Perfect" is now the table stakes though.

Edit:

My solution strives to cause an unproportional expense in order to circumvent. I want 10x on my time.

3. twelve+J4[view] [source] 2023-05-10 13:31:14
>>always+(OP)
it'd be cool to be able to fingerprint that garbage, too. Like, sprinkle some hashes here and there (or something like that) so that you can later uniquely look up your own "content" being stolen by chatbots and which ones.
replies(1): >>shaneb+Q5
◧◩
4. shaneb+Q5[view] [source] [discussion] 2023-05-10 13:36:55
>>twelve+J4
You can. I can't think of the appropriate term though. Hopefully someone else chimes in here with a link.
5. ape4+Q6[view] [source] 2023-05-10 13:41:22
>>always+(OP)
I like this idea. Of course it would have to be only to robots that visit a page disallowed by the robots.txt
6. dspill+I9[view] [source] 2023-05-10 13:55:12
>>always+(OP)
> Sending errors just incentivizes bot owners to fix the identifiable parts

Nah. It'll just make them fake their identity so it is harder to tell the traffic is from a bot.

[go to top]