books1 and books2 are OpenAI corpuses that have never (to my knowledge) had their content revealed.
books3 is public, developed outside of OpenAI and we know exactly what's in it.