You mean besides a few layers of LLMs near input and output that deal with tokens? We have the rest of the layers.
1. Syntax
2. Semantics
3. Pragmatics
4. Semiotics
These are the layers you need to solve.
Saussure already pointed out these issues over a century ago, and Linguists turned ML Researchers like Stuart Russell and Paul Smolensky tried in vain to resolve this.
It basically took 60 years just to crack syntax at scale, and the other layers are still fairly far away.
Furthermore, Syntax is not a solved problem yet in most languages.
Try communicating with GPT-4o in colloquial Bhojpuri, Koshur, or Dogri, let alone much less represented languages and dialects.