zlacker

[parent] [thread] 1 comments
1. walter+(OP)[view] [source] 2025-05-22 09:30:58
> Using an LLM and caching eg FAQs can save a lot of token credits

Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?

replies(1): >>EGreg+1q1
2. EGreg+1q1[view] [source] 2025-05-22 19:29:12
>>walter+(OP)
No, why would they. You are supposed to maintain that cache.

What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?

[go to top]