Let’s also be clear that making deals with Reddit isn’t stealing from creators, it’s not a platform where you own what you type in, same on here this is all public domain with no assumed rights to the text. If you write a book and openAI trains on it and starts telling it to kids at bed time, you 100% will have a legal claim in the future, but the companies already have protections in place to prevent exactly that. For example if you own your website you can request the data not be crawled, but ultimately if your text is publicly available anyone is allowed to read it, and the question it is anyone allowed to train AI on it is an open question that companies are trying to get ahead on.
GPT can't get retroactively untrained on stolen data.
I’m not sure what you mean by “steal” because it’s a relative term now, me reading your book isn’t stealing if I paid for it and it inspires me to write my own novel about a totally new story. And if you posted your book online, as of right now the legal precedent is you didn’t make any claims to it (anyone could read it for free) so that’s fair game to train on, just like the text I’m writing now also has no protections.
Nearly all Reddit history ever up to a certain date is available for download now online, only until they changed their policies did they start having tighter controls about how their data could be used.