zlacker

[parent] [thread] 5 comments
1. devind+(OP)[view] [source] 2023-12-27 17:35:12
for what it's worth, i asked altman directly and he denied using libgen or books2, but also deferred to murati and her team on specifics. but the Q&A wasn't recorded and they haven't answered my follow-ups.
replies(2): >>belter+H8 >>jprete+8a
2. belter+H8[view] [source] 2023-12-27 18:22:03
>>devind+(OP)
Really? Because the GPT-3 paper talks about "...two internet-based books corpora (Books1 and Books2)..." (see pages 8 and 9) - https://arxiv.org/pdf/2005.14165.pdf

Unclear what that corpora might be, or if its the same books2 you are referring to.

replies(1): >>simonw+qx
3. jprete+8a[view] [source] 2023-12-27 18:29:41
>>devind+(OP)
Why would he know the answer in the first place?
replies(1): >>Jensso+IB
◧◩
4. simonw+qx[view] [source] [discussion] 2023-12-27 20:32:37
>>belter+H8
My guess is that this poster meant books3, not books2.

books1 and books2 are OpenAI corpuses that have never (to my knowledge) had their content revealed.

books3 is public, developed outside of OpenAI and we know exactly what's in it.

replies(1): >>devind+Ode
◧◩
5. Jensso+IB[view] [source] [discussion] 2023-12-27 20:55:12
>>jprete+8a
The legal liabilities of the training data they use in their flagship product seems to be a thing the CEO should know.
◧◩◪
6. devind+Ode[view] [source] [discussion] 2024-01-02 07:04:22
>>simonw+qx
sorry, books3 is indeed what I meant.
[go to top]