zlacker

It’s quite ignorant to assume petabytes of garbage have any value at this point. See Chinchilla

replies(1): >>berkle+Rc

>>gmerc+(OP)
I agree, but there are hundreds if not thousands of AI startups trying to make their own relevant LLM, and they're going to be scraping Twitter. The Onion called it many years ago [1]: "400 billion tweets and not one useful bit of data was ever transmitted".

[1] https://www.youtube.com/watch?v=cqggW08BWO0&t=138s

replies(1): >>rightb+Zn

>>berkle+Rc
I can't imagine worse training data than e.g. Twitter and Reddit posts. How about like, dunno, books?

Edit: Ah, nvm, if you are trying to do a chat bot it is essentially what you want.

[go to top]