>>petarg+M9
That's not ideal and probably picking up some of the original training set from GPT-2 (this model is bolted on top of it)! How about DUOLINGOLOGY instead https://bit.ly/3fPGP8q
I'm using a blacklist to reject "real" words but it's surprisingly hard to build for rare words. I'm up to ~600K items after parsing Wikipedia tokens and it still doesn't capture everything.