zlacker

It's not a fallacy. Behind the AI are 180M users inputting their own problems and giving their guidance. Those millions of books only teach language skills they are not memorized verbatim except rare instances of duplicated text in the training set. There is not enough space to store 10 trillion tokens in a model.

And if we wanted to replicate copyrighted text with a LLM, it would still be a bad idea, better to just find a copy online, faster and more precise, and usually free. We here are often posting paywalled articles in the comments, it's so easy to circumvent the paywalls we don't even blink twice at it.

Using LLMs to infringe is not even the intended purpose, and it only happens when the user makes a special effort to prompt the model with the first paragraph.

What I find offensive is restricting the circulation of ideas under the guise of copyright. In fact copyright should only protect expression not the underlying ideas and styles, those are free to learn, and AIs are just an extension of their human users.