Long term, if no one is given credit for their research, either the creators will start to wall off their content or not create at all. Both options would be sad.
A humane attribution comment from the AI could go a long way - "I think I read something about this <topic X> in the NYTimes <link> on January 3rd, 2021."
It appears that without attribution, long term, nothing moves forward.
AI loses access to the latest findings from humanity. And so does the public.
The knowledge gets distorted, blended, and reinterpreted a million ways by the time it's given as output.
And the metadata (metaknowledge?) would be larger than the knowledge itself. The AI learnt every single concept it knows by reading online; including the structure of grammar, rules of logic, the meaning of words, how they relate to one another. You simply couldn't cite it all.
The model is fuzzy, it's the learning part, it'll never follow the rules to the letter the same as humans fuck up all the time.
But a model trained to be literate and parse meaning could be provided with the hard data via a vector DB or similar, it can cite sources from there or as it finds them via the internet and tbf this is how they should've trained the model.
But in order to become literate, it needs to read...and us humans reuse phrases etc we've picked up all the time "as easy as pie" oops, copyright.