Okay, ChatGPT is only text-to-text, but Google & Co are adding more modalities now, including images, audio and robotics. I think one missing step is to fuse training and inference regime into one, just as in animals. That probably requires something else than the usual transformer-based token predictors.
Just like ELIZA can be said to be faking it, ChatGPT is faking it in a different way.