I feel like this is the main distinction.
um, yes.[1][2] What else would they be trained on?
According to the model card:
[1] https://github.com/CompVis/stable-diffusion/blob/main/Stable...
it was trained on this data set(which has hyperlinks to images, so feel free to peruse):