>>bangal+y91
Quick napkin math assuming bfloat16 format : 1B * 16 bits = 16B bits = 2GB.
Since it's a 12B parameter model, you get around ~24GB. Downcasting to bfloat16 from float32 comes with pretty minimal performance degradation, so we uploaded the weights in bfloat16 format.