Megatron is a Large Language Model -- unfortunately it seems they really undertrained it for the parameter counts it had, so it was more a numbers game of "hey, look how big this model is!" when they first released it.
Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D