zlacker

[parent] [thread] 3 comments
1. blueha+(OP)[view] [source] 2025-07-31 21:22:47
How large was the dataset used for post-training?
replies(1): >>sangwu+a2
2. sangwu+a2[view] [source] 2025-07-31 21:39:02
>>blueha+(OP)
We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.
replies(1): >>lawles+9k
◧◩
3. lawles+9k[view] [source] [discussion] 2025-08-01 00:01:18
>>sangwu+a2
How is the data collected?
replies(1): >>sangwu+2l
◧◩◪
4. sangwu+2l[view] [source] [discussion] 2025-08-01 00:12:45
>>lawles+9k
The highest quality finetuning data was hand curated internally. I would say our post training pipeline is quite similar to SeedDream 2.0 ~ 3.0 series from ByteDance. Similar to them, we use extensive quality filters and internal models to get the highest quality possible. Even from there, we still hand curate a hand-picked subset.
[go to top]