Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?
What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?