Twitter Is DDOSing Itself

>>ZacnyL+(OP)
Also, taking Elon's word at face value for a second... is Twitter really worth scraping for AI training or whatever?

Its a hive of misinformation, disinformation and toxicity. Its succinct I guess, but nothing is eloquent or descriptive because of the character limit. And its full of repetitive "filler" information.

Who wants that in a foundational LLM dataset?

Maybe its OK for finding labeled images... But that still seems kidna iffy.

>>brucet+bl
The thing that LLMs bring to the table isn't factual knowledge — we already had that, even some AI projects specifically dedicated to that — but rather the ability to correctly interact with natural language.

Twitter is great for examples of that, and the toxicity and disinformation doesn't get in the way.

Conversely, a training set doesn't need to be up to date to be useful for that.

I don't know if anyone really was trying to scrape it (examples of Musk disagreeing with his own engineers come to mind), but I assume it's possible, and given the quality of code ChatGPT spits out I can easily believe a really bad scraper has been produced by someone who thought they could do without hiring a software developer. If so, they might think they can get hot stock tips or forewarning of a pandemic from which emoji people post or something — not really what an LLM is for, but loads of people (even here!) conflate all the different kinds of AI into one thing.

zlacker