zlacker

[return to "GitHub Copilot, with “public code” blocked, emits my copyrighted code"]
1. seanwi+Cq[view] [source] 2022-10-16 23:32:02
>>davidg+(OP)
For DALL-E and Stable Diffusion, the model size is an order of magnitude smaller than the total size of all the training set images? So it's not possible for the model to regurgitate every image in the training set exactly?

For Copilot, is there a similar argument? Or its model is large enough to contain the training set verbatim?

◧◩
2. numpad+xz[view] [source] 2022-10-17 00:56:44
>>seanwi+Cq
How small is DALL-E/SD, compared to say, training dataset images shrank to 120x120, JPEG compressed at q=0.3, compressed as .tar.bz2?
◧◩◪
3. Shamel+Tz[view] [source] 2022-10-17 01:00:10
>>numpad+xz
> The data can comfortably be downloaded with img2dataset (240TB in 384, 80TB in 224)

https://laion.ai/blog/laion-5b/

Not exactly what you asked, but hopefully useful? The model weights are about 4 GiB I believe.

[go to top]