I just scraped data from reddit and other sources so i could build a nsfw classifier and chose to open source the data and the model for general good.
Note that i was a 1 year experienced engineer working solely on this project in my free time, so it was basically impossible for me to review or clear out the few csam images in the 100,000+ images in the dataset.
Although, now i wonder if i should never have open sourced the data. Would have avoided lot of these issues.