I think that's almost completely backwards. The input and output layers just convert between natural language and embeddings i.e. shift the format of the language. But operating on the embeddings is where meaning (locations in vector-space) are transformed.