It seems that a lot of users will upload video which is by default published with the default settings and thus is visible from the outside. Even if they change the settings fairly quickly, automated systems like ours will already know about the existence of that video.
There could be other reasons but this seems the most likely, especially as a video that is being uploaded can be published fairly swiftly.
[0] https://pex.com
[1] https://blog.pex.com/what-content-dominates-on-youtube-39081...
Even if it may not be illegal, at the very least it would seem un-ethical to link to private videos like this, and it would seem trivial for you to "re-scrape" your database every now and then to check whether any existing videos have changed from listed -> unlisted, and if they have, remove them.
I think a better approach for everyone involved would be to only store references to videos which were posted more than x minutes ago. I'm not sure if they have that information when scraping though.
>It seems that a lot of users will upload video which is by default published [and then they change it to private] //
So to avoid that sort of unexpected public-ing (ie publishing) only one extra scrape would be needed. Or, if they knew the period over which the setting was normally changed then they could just delay the scrape until most would have already been changed.
I imagine though, in part, the 'fun' is catching inadvertent publication and morality is no t considered.
Also I don't believe unlisted videos are considered to be private. There is a private setting which disallows for public to see such a video.
And finally, it's not very trivial to touch 5.5 billion videos often enough to see if any of those became unlisted.
It would beat the purpose of our service would we delay our identification, and it would actually require some significant engineering efforts in order to introduce such capabilities into our system with significant economical impact on our business.