zlacker

[parent] [thread] 4 comments
1. azurez+(OP)[view] [source] 2023-11-20 05:50:00
You know, we are dealing with people and people think differently so they won't move altogether. Moving to a new company under Alt is not an easy choice, at least the new company:

- Does not yet have a big model (need $$$ and months to train, if code is ready)

- Does not have proprietary code OpenAI has right now

- Does not have labeled data ($$$ and time) and chatgpt logs

- Does not have ChatGpt brand...

replies(1): >>buffer+E1
2. buffer+E1[view] [source] 2023-11-20 05:59:46
>>azurez+(OP)
I thought GPT-4 was not trained on labeled data, but simply on a large volume of text / code. Most of it is publicly accessible: wikipedia, archives of scientific articles, books, github, plus probably purchased data from text-heavy sites like Reddit.
replies(3): >>enigmu+K2 >>lyu072+g4 >>frabcu+55
◧◩
3. enigmu+K2[view] [source] [discussion] 2023-11-20 06:06:29
>>buffer+E1
Assuming it's a reference to RLHF? Not sure
◧◩
4. lyu072+g4[view] [source] [discussion] 2023-11-20 06:14:00
>>buffer+E1
No it's reinforcement learning with human feedback, RLHF lots of labeling
◧◩
5. frabcu+55[view] [source] [discussion] 2023-11-20 06:19:17
>>buffer+E1
Whatever they've built this year presumably uses all the positive/negative feedback on ChatGPT that they have a year worth of data now...

Another examples is the Be My Eyes data - presumably the vision part of GPT-4 was trained on the archive of data the blind assistance app has, and that could be an exclusive deal with OpenAI.

[go to top]