zlacker

An LLM can be used for a clean room design so long as all (ALL) of its training data is in the clean room (and consequently does not contain the copyrighted work being reverse engineered).

An LLM trained on the Internet-at-large is also presumably suitable for a clean room design if it can be shown that its training completed prior to the existence of the work being duplicated, and thus could not have been contaminated.

This doesn't detract from the core of your point, that LLM output may be copyright-contaminated by LLM training data. Yes, but that doesn't necessarily mean that an LLM output cannot be a valid clean-room reverse engineer.

replies(1): >>accoun+pq

>>Boreal+(OP)
> An LLM trained on the Internet-at-large is also presumably suitable for a clean room design if it can be shown that its training completed prior to the existence of the work being duplicated, and thus could not have been contaminated.

This is assuming that you are only concerned with a particular work when you need to be sure that you are not copying any work that might be copyrighted without making sure to have a valid license that you are abiding by.

replies(1): >>Boreal+Sy

>>accoun+pq
The "clean room" in "clean room reverse engineering" refers to a particular set of trade secrets, yes. You could have a clean room and still infringe if an employee in the room copied any work they had ever seen.

The clean room has to do with licenses and trade secrets, not copyright.