zlacker

[parent] [thread] 10 comments
1. jeswin+(OP)[view] [source] 2025-05-06 15:25:20
Now if there was a way to add prepaid credits and monitor usage near real-time on a dashboard, like every other vendor. Hey Google are you listening?
replies(7): >>greena+8 >>Hawken+u >>slig+x >>cchanc+s1 >>tucnak+R1 >>therea+R2 >>simple+Vf
2. greena+8[view] [source] 2025-05-06 15:26:08
>>jeswin+(OP)
You can do that by using deepinfra to manage your billing. It's pay-as-you-go and they have a pass-through virtual target for Google Gemini.

Deepinfra token usage updates every time you switch to the tab if it is opened to the usage page so it is possible to see updates even every second

3. Hawken+u[view] [source] 2025-05-06 15:27:23
>>jeswin+(OP)
You can do this with https://openrouter.ai/
replies(1): >>pzo+Wg
4. slig+x[view] [source] 2025-05-06 15:27:53
>>jeswin+(OP)
In in the meantime, I'm using openrouter.
5. cchanc+s1[view] [source] 2025-05-06 15:32:26
>>jeswin+(OP)
openrouter, i dont think anyone should use google direct till they fix their shit billing
replies(1): >>greena+93
6. tucnak+R1[view] [source] 2025-05-06 15:35:01
>>jeswin+(OP)
You need LLM Ops. YC happens to have invested in Langfuse, which is if you're serious about tracking metrics, you'll appreciate the rest, too.

And before you ask: yes, for cached content and batch completion discounts you can accommodate both—just needs a bit of logic in your completion-layer code.

7. therea+R2[view] [source] 2025-05-06 15:41:44
>>jeswin+(OP)
Is this on Google AI Studio or Google Vertex or both?
◧◩
8. greena+93[view] [source] [discussion] 2025-05-06 15:43:30
>>cchanc+s1
Even afterwards. Avoid paying directly if you can because they generally could not care less about individuals.

You have less than $10 million in spend you will be treated worse than cattle because at least farmers feed their cattle before they are milked

9. simple+Vf[view] [source] 2025-05-06 16:52:22
>>jeswin+(OP)
You can do this with LLM proxies like LiteLLM. e.g. Cursor -> LiteLLM -> LLM provider API.

I have LiteLLM server running locally with Langfuse to view traces. You configure LiteLLM to connect directly to providers' APIs. This has the added benefit of being able to create LiteLLM API keys per project that proxies to different sets of provider API keys to monitor or cap billing usage.

I use https://github.com/LLemonStack/llemonstack/ to spin up local instances of LiteLLM and Langfuse.

◧◩
10. pzo+Wg[view] [source] [discussion] 2025-05-06 16:57:24
>>Hawken+u
but if you want to use google SDK (python-genai, js-genai) rather than openai SDK (If found google api more feature rich when using different modality like audio/images/video) you cannot use openrouter. Also not sure if you are developing app and needs higher rate limits - what's typical rate limit via openrouter?
replies(1): >>pzo+wh
◧◩◪
11. pzo+wh[view] [source] [discussion] 2025-05-06 17:00:07
>>pzo+Wg
also for some reason I tested simple prompt (few words, no system prompt) with attached 1 images and openrouter charged me like ~1700 tokens when on the other hand using directly via python-genai its like ~400 tokens. Also keep in mind they charge small markup fee when you top you their account.
[go to top]