zlacker

[parent] [thread] 0 comments
1. jakein+(OP)[view] [source] 2023-12-27 16:47:51
I don't see a judge ruling that training a model on copyrighted works to be infringement, I think (hope) that that is ruled to be protected as fair use. It's the LLM output behaviour, specifically the model's willingness to reproduce verbatim text which is clearly a violation of copyright, and should rightfully result in royalties being paid out. It also seems like something that should be technically feasible to filter out or cite, but with a serious cost (both in compute and in latency for the user). Verbatim text should be easy to identify, although it may require a Google Search - level amount of indexing and compute. As for summaries and text "in the style of" NYT or others, that's the tricky part. Not sure there's any high-precision way to identify that on the output side of an LLM, though I can imagine a GAN trained to do so (erring on the side of false-positives). Filtering-out suspiciously infringe-ish outputs and re-running inference seems much more solvable than perfect citations for non-verbatim output.
[go to top]