Models can be trained to generate tokens with many different meanings, including visual, auditory, textual, and locomotive. Those alone seem sufficient to emulate a human to me.
It would certainly be cool to integrate some subsystems like a symbolic reasoner or calculator or something, but the bitter lesson tells us that we'd be better off just waiting for advancements in computing power.