zlacker

[parent] [thread] 0 comments
1. sangwu+(OP)[view] [source] 2025-07-31 21:00:27
Quick napkin math assuming bfloat16 format : 1B * 16 bits = 16B bits = 2GB. Since it's a 12B parameter model, you get around ~24GB. Downcasting to bfloat16 from float32 comes with pretty minimal performance degradation, so we uploaded the weights in bfloat16 format.
[go to top]