zlacker

[return to "Gemini Robotics On-Device brings AI to local robotic devices"]
1. polski+Ka1[view] [source] 2025-06-24 20:52:54
>>meetpa+(OP)
What is the model architecture? I'm assuming it's far away from LLMs, but I'm curious about knowing more. Can anyone provide links that describe architectures for VLA?
◧◩
2. KoolKa+yc1[view] [source] 2025-06-24 21:03:07
>>polski+Ka1
Actually very close to one I'd say.

It's a "visual language action" VLA model "built on the foundations of Gemini 2.0".

As Gemini 2.0 has native language, audio and video support, I suspect it has been adapted to include native "action" data too, perhaps only on output fine-tuning rather than input/output at training stage (given its Gemini 2.0 foundation).

Natively multimodal LLM's are basically brains.

◧◩◪
3. quantu+lt1[view] [source] 2025-06-24 23:12:47
>>KoolKa+yc1
> Natively multimodal LLM's are basically brains.

Absolutely not.

◧◩◪◨
4. KoolKa+L92[view] [source] 2025-06-25 07:56:12
>>quantu+lt1
Lol keep telling yourself that. It's not a human brain nor is it necessarily a very intelligent brain, but it is a brain nonetheless.
◧◩◪◨⬒
5. quantu+oM3[view] [source] 2025-06-25 19:19:10
>>KoolKa+L92
Not a useful commentary. ANN and BNN are slightly correlated. That fact that you want to believe it is a brain tells a lot about you, but it doesn’t make a model a brain.

Only suggestion I have is “study more”.

◧◩◪◨⬒⬓
6. KoolKa+EG6[view] [source] 2025-06-26 22:30:10
>>quantu+oM3
They're not merely slightly correlated.

If it looks like a duck and quacks like a duck...

Just because it is alien to you, does not mean it is not a brain, please go look up the definition of the word.

And my comment is useful, a VLA implies it is processing it's input and output natively, something a brain does hence my comment.

[go to top]