H100s have 80GB of 5120-bit HBM with SXM NVLink for 8-at-a-time in a rack.
HUGE difference in bandwidth when doing anything where the inferring the model needs to be spread over multiple GPUs, which all LLM's are. And even more of a difference when training is in play.