zlacker

[return to "GitHub Copilot, with “public code” blocked, emits my copyrighted code"]
1. seanwi+Cq[view] [source] 2022-10-16 23:32:02
>>davidg+(OP)
For DALL-E and Stable Diffusion, the model size is an order of magnitude smaller than the total size of all the training set images? So it's not possible for the model to regurgitate every image in the training set exactly?

For Copilot, is there a similar argument? Or its model is large enough to contain the training set verbatim?

◧◩
2. numpad+xz[view] [source] 2022-10-17 00:56:44
>>seanwi+Cq
How small is DALL-E/SD, compared to say, training dataset images shrank to 120x120, JPEG compressed at q=0.3, compressed as .tar.bz2?
◧◩◪
3. wccraw+Av1[view] [source] 2022-10-17 11:53:21
>>numpad+xz
IIRC, 2.5 billion images were used to create a 4.5GB dataset. That is less than 2 bytes per original image.
[go to top]