We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.
An LLM trained on the Internet-at-large is also presumably suitable for a clean room design if it can be shown that its training completed prior to the existence of the work being duplicated, and thus could not have been contaminated.
This doesn't detract from the core of your point, that LLM output may be copyright-contaminated by LLM training data. Yes, but that doesn't necessarily mean that an LLM output cannot be a valid clean-room reverse engineer.
This is assuming that you are only concerned with a particular work when you need to be sure that you are not copying any work that might be copyrighted without making sure to have a valid license that you are abiding by.
The clean room has to do with licenses and trade secrets, not copyright.