zlacker

[parent] [thread] 0 comments
1. nickpp+(OP)[view] [source] 2023-11-22 19:09:18
Is this a real problem model trainers actually face or is it an imagined one? The Internet is already full of garbage - 90% of the unpleasantness of browsing these days is filtering through mounts and mounds of crap. Some is generated, some is written, but still crap full of wrong and lies.

I would've imagined training sets were heavily curated and annotated. We already know how to solve this problem for training humans (or our kids would never learn anything useful) so I imagine we could solve it similarly for AIs.

In the end, if it's quality content, learning it is beneficial - no matter who produced it. Garbage needs to be eliminated and the distinction is made either by human trainers or already trained AIs. I have no idea how to train the latter but I am no expert in this field - just like (I suspect) the author of that blog.

[go to top]