On one hand, LLMs do require significant amounts of compute to train. But the other hand, if you amortize training costs across all user sessions, is it really that big a deal? And that’s not even factoring in Moore’s law and incremental improvements to model training efficiency.
AI inference is so costly at scale, one can easily see data centers start using 4% of total electricity, and in the next decade 8%. That will start to have severe effects on the power grid, basically require planning many years in advance to setup new power plants and such.
"Moore's law and incremental improvements' are irrelevant in the face of scaling laws. Since we aren't at AGI yet, every extra bit of compute will be dumped back into scaling the models and improving performance.