Deepinfra token usage updates every time you switch to the tab if it is opened to the usage page so it is possible to see updates even every second
And before you ask: yes, for cached content and batch completion discounts you can accommodate both—just needs a bit of logic in your completion-layer code.
You have less than $10 million in spend you will be treated worse than cattle because at least farmers feed their cattle before they are milked
I have LiteLLM server running locally with Langfuse to view traces. You configure LiteLLM to connect directly to providers' APIs. This has the added benefit of being able to create LiteLLM API keys per project that proxies to different sets of provider API keys to monitor or cap billing usage.
I use https://github.com/LLemonStack/llemonstack/ to spin up local instances of LiteLLM and Langfuse.