zlacker

> distributed training

Unfortunately this isn't a thing. Eg too much batch norm latency leaves your GPUs idle. Unless all your hardware is in the same building, training a single model would be so inefficient that it's not worth it.