zlacker

[parent] [thread] 8 comments
1. 8n4vid+(OP)[view] [source] 2023-01-14 08:07:00
I don't know if you're right or wrong, but it seems plausible that we could create a database of copyrighted images to check against.
replies(2): >>visarg+02 >>jaapba+A2
2. visarg+02[view] [source] 2023-01-14 08:25:34
>>8n4vid+(OP)
Only the training set should suffice.
3. jaapba+A2[view] [source] 2023-01-14 08:31:48
>>8n4vid+(OP)
Every original image is copyrighted. You're suggesting making a digital copy of every image there is to check that AI isn't generating digital copies of every image there is.
replies(1): >>8n4vid+ea
◧◩
4. 8n4vid+ea[view] [source] [discussion] 2023-01-14 09:57:21
>>jaapba+A2
Not a copy, a hash or fingerprint. Just enough data to measure if it's substantially similar.

But yes, it may be infeasible to index and compare against every image ever uploaded.

replies(2): >>Kim_Br+Gx >>galley+vZ
◧◩◪
5. Kim_Br+Gx[view] [source] [discussion] 2023-01-14 14:00:32
>>8n4vid+ea
If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)
replies(2): >>Fillig+RC >>8n4vid+AH3
◧◩◪◨
6. Fillig+RC[view] [source] [discussion] 2023-01-14 14:47:23
>>Kim_Br+Gx
Approximately, yes.
◧◩◪
7. galley+vZ[view] [source] [discussion] 2023-01-14 17:40:01
>>8n4vid+ea
Couldn't I just add a few non-sense bytes into my images to change the hash/fingerprint?
replies(1): >>8n4vid+lH3
◧◩◪◨
8. 8n4vid+lH3[view] [source] [discussion] 2023-01-15 19:41:13
>>galley+vZ
Hash yes, fingerprint maybe no. Maybe I'm using the term incorrectly here, but I think of fingerprint like a lossy hash. Like one way of doing this would be to resize the image to, say, 8 by 8, and quantize it to say, 16 colors. So the fingerprint size is 884 bits=32 bytes. Tiny changes aren't likely to change the fingerprint. You'd probably have to do something a little more clever so as not to get too many false positives though. Or once you get a hit, do a deeper comparison.
◧◩◪◨
9. 8n4vid+AH3[view] [source] [discussion] 2023-01-15 19:43:29
>>Kim_Br+Gx
Yeah, I guess so. The models are only 4 or 8 GB. A giant list of hashes would be bigger, sure. But they're 2 very different things. Model is for generating new images, this hash database is copyright enforcement. If you really want to check for violations I don't know how else you're going to do it.
[go to top]