If so the correct analogy is not a collage but a musical scale and yes Beethoven took musical notes that Bach had used but it was not exactly a copy.
Thus, when we see things, we have already built a relationship map of the parts of an image, not actual pixels. This makes it possible to observe the world and interact with it in real time referencing the pieces and the concepts we label them with, otherwise we'd have to stop and very carefully look around every single time we wanted to take a step.
These networks effectively do the same thing, taking in parts of images and their relationships. It's not uncommon for me to see what is clearly a distorted copy of a Getty Images trademark when I run stable diffusion locally. There's an artist who always puts his daughter Nina's name in his work... the network just thinks its just another style, and I suspect that's same for the Getty thing.
One thing that is super cool is you can draw a horribly amateur sketch of something, and have Stable Diffusion turn it into something close to the starting drawing in outline, but far better in detail.
A sketch of a flower I did came out as Tulips, Roses and Poppy depending on the prompts used to process it, but it was generally in the same pose and scale.
[1] https://developer.tobii.com/xr/learn/eye-behavior/visual-ang...