Sure, it's a hard problem, but as others have pointed out frequently in this thread.. there is not only "no incentive" to solve it but a clear disincentive. If one can say where the data comes from, one might have to prove that it was used only with permission. And the reason why it's a hard problem is not related to metadata volume being greater than content volume. Clearly a book title/year published is usually shorter than book contents.