It gets expensive fast, but not messy, these things scale horizontally really well. All the state is encapsulated in the request, no replication, synchronisation, user data to worry about. I'd rather have the job of horizontally scaling llama2 than a relational database.
My thing is that dynamically doing that is still a lot compared to just calling a single endpoint and all of that is handled for you.
But for sure this is a very decent horizontal use-case.