zlacker

Open weights means that the current prices for inference of Chinese models are indicative of their cost to run because.

https://openrouter.ai/moonshotai/kimi-k2.5

It's a fantasy to believe that every single one of these 8 providers is serving at incredibly subsidized dumping prices 50% below cost and once that runs out suddenly you'll pay double for 1M of tokens for this model. It's incredibly competitive with Sonnet 4.5 for coding at 20% of the token price.

I encourage you to become more familiar with the market and stop overextrapolating purely based on rumored OpenAI numbers.

replies(1): >>IhateA+Z2

>>deaux+(OP)
I'm not making any guesses, I happen to know for a fact what it costs. Please go try to sell inference and compete on price. You actually have no clue what you're talking about. I knew when I sent that response I was going to get "but Kimi!"

replies(3): >>mcmcmc+q5 >>anon37+S5 >>deaux+z7

>>IhateA+Z2
It really is insane how far it's gone. All of the subsidization and free usage is deeply anticompetitive, and it is only a profitable decision if they can recoup all the losses. It's either a bubble and everything will crash, or within a few years once the supplier market settles, they will eventually start engaging in cartel-like behavior and ratchet up the price level to turn on the profits.

>>IhateA+Z2
The numbers you stated sound off ($500k capex + electricity per 3 concurrent requests?). Especially now that the frontier has moved to ultra sparse MoE architectures. I’ve also read a couple of commodity inference providers claiming that their unit economics are profitable.

replies(1): >>IhateA+w32

>>IhateA+Z2
Okay, so you are claiming "every single one of those 8 providers, along with all others who don't serve openrouter but are at similar price points, are subsidizing by more than 50%".

That's an incredibly bold claim that would need quite a bit of evidence, and just waving "$500k in gpus" isn't it. Especially when individuals are reporting more than enough tps at native int4 with <$80k setups, without any of the scaling benefits that commercial inference providers have.

replies(1): >>IhateA+F02

>>deaux+z7
Imagine thinking that $80k setups to run Kimi and serve a single user session is evidence that inference providers are running at cost, or even close to it. Or that this fact is some sort of proof that token pricing will come down. All you one-shotted llm dependents said the same thing about Deepseek.

I know you need to cope because your competency is 1:1 correlated to the quality and quantity of tokens you can afford, so have fun with your Think for me SaaS while you can afford it. You have no clue the amount of engineering that goes into provide inference at scale. I wasn't even including the cost of labor.

replies(1): >>deaux+SJ3

>>anon37+S5
You're delusional, I didn't even include the labor the install and run the damn thing. More than 500k

>>IhateA+F02
It directly disproves this wild claim

> You still need $500k in GPUs and a boatload of electricity to serve like 3 concurrent sessions at a decent tok/ps.

as being patent bullshit, after which the burden is squarely on you to back up the remainder of your claims.

replies(1): >>IhateA+UP3

>>deaux+SJ3
You are literally telling me that an open source model costs $80k "at decent tok/ps (whatever that means)" to run a single session as proof something. How come people aren't dropping Anthropic for Kimi, it costs 10x less... You aren't a serious person worth engaging with.