Its a hive of misinformation, disinformation and toxicity. Its succinct I guess, but nothing is eloquent or descriptive because of the character limit. And its full of repetitive "filler" information.
Who wants that in a foundational LLM dataset?
Maybe its OK for finding labeled images... But that still seems kidna iffy.
Or maybe you want to get an aggregate idea of what people are currently talking about in the world, stuff that doesn't rise to the level of capital-n News. There aren't a lot of alternatives for that.
Yeah, lots of general chat is unfortunately stuck in Twitter (or difficult -to-scrape siloed off platforms.