Often, the training is done in FP16 then quantized down to FP8 or FP4 for distribution.
i asked chat for an explanation and it said bfloat has a higher range (like fp32) but less precision.
what does that mean for image generation and why was bfloat chosen over fp?