Also, detecting videos that are inappropriate for children is a lot harder than determining certain content creators that are trustworthy to post videos that are appropriate (and to tag them correctly). That can be learned from the user's history, how many times their stuff has been flagged, getting upvotes from users that are themselves deemed credible, and so on. The more layers of indirection, the better, a la PageRank.
So even without analyzing the video itself, it would have a much smaller set of videos it can recommend from, but still potentially millions of videos. You still need some level of staff to train the algorithm, but you don't have to have paid staff look at every single video to have a good set of videos it can recommend. The staff might spend most of their time looking at videos that are anomalous, such as they were posted by a user the algorithm trusted but then flagged by a user that the algorithm considered credible. Then they would tag that video with some rich information that will help the algorithm in the future, beyond just removing that video or reducing the trust of the poster or the credibility of the flagger.
https://www.theverge.com/2019/2/11/18220032/youtube-copystri...