zlacker

[parent] [thread] 1 comments
1. dvrp+(OP)[view] [source] 2025-07-31 21:41:25
Copying and pasting Sangwu’s answer:

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.

replies(1): >>lawles+Ai
2. lawles+Ai[view] [source] 2025-08-01 00:12:18
>>dvrp+(OP)
How is data acquired and curated?
[go to top]