zlacker

[parent] [thread] 2 comments
1. pseudo+(OP)[view] [source] 2025-12-05 20:01:52
I'm really fascinate by the opportunities to analyze videos. The amount of tokens it compresses down to, and what you can reason across those tokens, is incredible.
replies(1): >>minima+s1
2. minima+s1[view] [source] 2025-12-05 20:09:19
>>pseudo+(OP)
The actual token calculations with input videos for Gemini 3 Pro is...confusing.

https://ai.google.dev/gemini-api/docs/media-resolution

replies(1): >>pseudo+Mo
◧◩
3. pseudo+Mo[view] [source] [discussion] 2025-12-05 22:10:20
>>minima+s1
That is because it isn't actually tokens that are fed into the model for non-text. For text, it is tokenized, and each token has a specific set of vectors. But with other media, they've trained encoders that analyze the media and produce a set of vectors that are the same "format" as the token's vectors, but it isn't actually ever a token.

Most companies have rules for how many tokens the media should "cost", but they aren't usually exact.

[go to top]