Zebra-Llama: Towards Efficient Hybrid Models

>>mirrir+(OP)
Due to perverse incentives and the historical nature of models over-claiming accuracy, it's very hard to believe anything until it is open source and can be tested out

that being said, I do very much believe that computational efficiency of models is going to go up [correction] drastically over the coming months, which does pose interesting questions over nvidia's throne

*previously miswrote and said computational efficiency will go down

>>aditya+79
I don't doubt the increase in efficiency. I doubt the "drastically".

We already see models become more and more capable per weight and per unit of compute. I don't expect a state-change breakthrough. I expect: more of the same. A SOTA 30B model from 2026 is going to be ~30% better than one from 2025.

Now, expecting that to hurt Nvidia? Delusional.

No one is going to stop and say "oh wow, we got more inference efficiency - now we're going to use less compute". A lot of people are going to say "now we can use larger and more powerful models for the same price" or "with cheaper inference for the same quality, we can afford to use more inference".

zlacker