zlacker

Due to perverse incentives and the historical nature of models over-claiming accuracy, it's very hard to believe anything until it is open source and can be tested out

that being said, I do very much believe that computational efficiency of models is going to go up [correction] drastically over the coming months, which does pose interesting questions over nvidia's throne

*previously miswrote and said computational efficiency will go down

replies(3): >>daniel+n >>credit+J >>ACCoun+86

>>aditya+(OP)
I think you mean computational efficiency will go _up_ in the future. To your last point: Jevons paradox might apply.

replies(1): >>aditya+s1

>>aditya+(OP)
Like this?

https://huggingface.co/amd/Zebra-Llama-8B-8MLA-24Mamba-SFT

replies(4): >>aditya+h2 >>moffka+s9 >>deepda+U9 >>jychan+X9

>>daniel+n
yup that's what I meant!, Jevon's paradox applies to resource usage in general and not towards a specific companies dominance

if computational efficiency goes up (thanks for the correction), and CPU inference becomes viable for most practical applications, GPUs (or accelerators) themselves may be unnecessary for most practical functions

replies(1): >>atq211+Z4

>>credit+J
yes!, thanks for the link!

>>aditya+s1
Discrete GPUs still have an advantage in memory bandwidth. Though this might push platforms like laptops towards higher bandwidths, which would be nice.

>>aditya+(OP)
I don't doubt the increase in efficiency. I doubt the "drastically".

We already see models become more and more capable per weight and per unit of compute. I don't expect a state-change breakthrough. I expect: more of the same. A SOTA 30B model from 2026 is going to be ~30% better than one from 2025.

Now, expecting that to hurt Nvidia? Delusional.

No one is going to stop and say "oh wow, we got more inference efficiency - now we're going to use less compute". A lot of people are going to say "now we can use larger and more powerful models for the same price" or "with cheaper inference for the same quality, we can afford to use more inference".

replies(1): >>colech+q7

>>ACCoun+86
Eh.

Right now, Claude is good enough. If LLM development hit a magical wall and never got any better, Claude is good enough to be terrifically useful and there's diminishing returns on how much good we get out of it being at $benchmark.

Saying we're satisfied with that... well how many years until efficiency gains from one side and consumer hardware from the other meet in the middle so "good enough for everybody" open models are available for anyone who wants to pay for a $4000 MacBook (and after another couple of years a $1000 MacBook, and several more and a fancy wristwatch).

Point being, unless we get to a point where we start developing "models" that deserve civil rights and citizenship, the years are numbered to where we NEED cloud infrastructure and datacenters full of racks and racks of $x0,000 hardware.

I strongly believe the top end of the S curve is nigh, and with it we're going to see these trillion dollar ambitions crumble. Everybody is going to want a big-ass GPU and a ton of RAM but that's going to quickly become boring because open models are going to exist that eat everybody's lunch and the trillion dollar companies trying to beat them with a premium product aren't going to stack up outside of niche cases and much more ordinary cloud compute motivations.

replies(2): >>ACCoun+g9 >>buu700+bm

>>colech+q7
Good enough? There's no such thing.

People said that "good enough" about GPT-4. Now you say that about Claude Opus 4.5. How long before the treadmill turns, and the very same Opus 4.5 becomes "the bare minimum" - the least capable AI you would actually consider using for simple and unimportant tasks?

We have miles and miles of AI advancements ahead of us. The end of that road isn't "good enough". It's "too powerful to be survivable".

replies(2): >>cyanyd+qb >>colech+Zb

>>credit+J
GGUF when? /s

>>credit+J
> which does pose interesting questions over nvidia's throne...

> Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that...

Hmmm

>>credit+J
Or like this: https://api-docs.deepseek.com/news/news251201

I don't know what's so special about this paper.

- They claim to use MLA to reduce KV cache by 90%. Yeah, Deepseek invented that for Deepseek V2 (and also V3 and Deepseek R1 etc)

- They claim to use a hybrid linear attention architecture. So does Deepseek V3.2 and that was weeks ago. Or Granite 4, if you want to go even further back. Or Kimi Linear. Or Qwen3-Next.

- They claimed to save a lot of money not doing a full pre-train run for millions of dollars. Well, so did Deepseek V3.2... Deepseek hasn't done a full $5.6mil full pretraining run since Deepseek V3 in 2024. Deepseek R1 is just a $294k post train on top of the expensive V3 pretrain run. Deepseek V3.2 is just a hybrid linear attention post-train run - i don't know the exact price, but it's probably just a few hundred thousand dollars as well.

Hell, GPT-5, o3, o4-mini, and gpt-4o are all post-trains on top of the same expensive pre-train run for gpt-4o in 2024. That's why they all have the same information cutoff date.

I don't really see anything new or interesting in this paper that isn't already something Deepseek V3.2 has already sort of done (just on a bigger scale). Not exactly the same, but is there anything amazingly new that's not in Deepseek V3.2?

replies(3): >>T-A+2j >>nickps+pm >>credit+zn

>>ACCoun+g9
Elon will boil the oceans if it means not having to deal with poor people.

>>ACCoun+g9
I can build fully functional applications without writing a single line of code with Claude. In my free time. On a weekend. I'm going to release one of them pretty soon. A toddler being able to do this instead of an industry veteran isn't that compelling. Avoiding the few pitfalls of the LLM getting stuck and taking a while to get out isn't that valuable.

>Good enough? There's no such thing.

This is just wrong. Maybe you can't imagine good enough, I can. And I think "better" is going to start getting diminishing returns as the velocity of improvements I expect to slow and the value of improvements are going to become less meaningful. The "cost" of a LLM making mistakes is already pretty low, cutting it in half is better, sure, but it's so low already I don't particularly care if it gets some multiple more rare.

>>jychan+X9
From your link: DeepSeek-V3.2 Release 2025/12/01

From Zebra-Llama's arXiv page: Submitted on 22 May 2025

>>colech+q7
Coding capability in and of itself may be "good enough" or close to it, but there's a long way to go before AI can build and operate a product end-to-end. In fairness, a lot of the gap may be tooling.

But the end state in my mind is telling an AI "build me XYZ", having it ask all the important questions over the course of a 30-minute chat while making reasonable decisions on all lower-level issues, then waking up the next morning to a live cloud-hosted test environment at a subdomain of the domain it said it would buy along with test builds of native apps for Android, iOS, Linux, macOS, and Windows, all with near-100% automated test coverage and passing tests. Coding agents feel like magic, but we're clearly not there yet.

And that's just coding. If someone wanted to generate a high-quality custom feature-length movie within the usage limits of a $20/mo AI plan, they'd be sorely disappointed.

>>jychan+X9
"Deepseek hasn't done a full $5.6mil full "

Don't forget the billion dollars or so of GPU's they had access to that they left out of that accounting. Also, the R&D cost of the Meta model they originally used. Then, they added $5.6 million on top of that.

>>jychan+X9
Here's what's important about this paper. It is written by AMD researchers. It shows AMD is investing in AI research. Is this the same level of achievement as DeepSeek 3.2. Most likely not. Do they have novel ideas? Difficult to say, there are hundreds of new ideas being tried in this space. Is this worthless? Most certainly not. In order to make progress in this domain (as in any other), you first need to get your feet wet. You need to play with the various components, and see how they fit together. The idea in this paper is that you can combine somehow SSMs (like Mamba) and LLMs (like LLama). The examples they give are absolute toys compared to DeepSeek 3.2 (the largest is 8 billion parameters, while DeepSeek 3.2 has 671 billion parameters). The comparison you are trying to make simply does not apply. The good news for all of us is that AMD is working in this space.