Really, these sorts of ideas feel like we're getting to the "put everything on the blockchain!" phase. "Let's spend more GPU power for creating speech for the Sims than it takes to run the Sims itself!"
I don't. I assume it would need to be constantly running to know when it wants to speak and there will be multiple actors on the screen all the time. Do we have actual estimates for how much a response costs in ChatGPT? All I know is it takes a lot of video cards to power that system.
> If you have some optimized LLMs running on the client
Do these currently exist? I was under the impression that tech to date is compute intensive if you're looking for near real time interaction.