Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?
This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.
As mentioned in The Fucking Article, there's a legal difference between training an AI which largely doesn't repeat things verbatim (ala Anthropic) and redistributing media as a whole (ala Spotify, Netflix, journal, ad agency).