zlacker

[parent] [thread] 5 comments
1. 8n4vid+(OP)[view] [source] 2023-01-14 09:57:21
Not a copy, a hash or fingerprint. Just enough data to measure if it's substantially similar.

But yes, it may be infeasible to index and compare against every image ever uploaded.

replies(2): >>Kim_Br+sn >>galley+hP
2. Kim_Br+sn[view] [source] 2023-01-14 14:00:32
>>8n4vid+(OP)
If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)
replies(2): >>Fillig+Ds >>8n4vid+mx3
◧◩
3. Fillig+Ds[view] [source] [discussion] 2023-01-14 14:47:23
>>Kim_Br+sn
Approximately, yes.
4. galley+hP[view] [source] 2023-01-14 17:40:01
>>8n4vid+(OP)
Couldn't I just add a few non-sense bytes into my images to change the hash/fingerprint?
replies(1): >>8n4vid+7x3
◧◩
5. 8n4vid+7x3[view] [source] [discussion] 2023-01-15 19:41:13
>>galley+hP
Hash yes, fingerprint maybe no. Maybe I'm using the term incorrectly here, but I think of fingerprint like a lossy hash. Like one way of doing this would be to resize the image to, say, 8 by 8, and quantize it to say, 16 colors. So the fingerprint size is 884 bits=32 bytes. Tiny changes aren't likely to change the fingerprint. You'd probably have to do something a little more clever so as not to get too many false positives though. Or once you get a hit, do a deeper comparison.
◧◩
6. 8n4vid+mx3[view] [source] [discussion] 2023-01-15 19:43:29
>>Kim_Br+sn
Yeah, I guess so. The models are only 4 or 8 GB. A giant list of hashes would be bigger, sure. But they're 2 very different things. Model is for generating new images, this hash database is copyright enforcement. If you really want to check for violations I don't know how else you're going to do it.
[go to top]