Cloudlflare builds OAuth with Claude and publishes all the prompts

>>gregor+(OP)
On the one hand, I would expect LLMs to be able to crank out such code when prompted by skilled engineers who also understand prompting these tools correctly. OAuth isn’t new, has tons of working examples to steal as training data from public projects, and in a variety of existing languages to suit most use cases or needs.

On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context. That has been bandied about for years, with no evidence to show for it beyond specifically cherry-picked examples, often from highly-controlled environments.

I never doubted that, with competent engineers, these tools could be used to generate “new” code from past datasets. What I continue to doubt is the utility of these tools given their immense costs, both environmentally and socially.

>>stego-+6b
> On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc

Does it even have to be able to do so? Just the ability to speed up exploration and validation based on what a human tells it to do is already enormously useful, depending on how much you can speed up those things, and how accurate it can be.

Too slow or too inaccurate and it'll have a strong slowdown factor. But once some threshold been reached, where it makes either of those things faster, I'd probably consider the whole thing "overall useful". Nut of course that isn't the full picture and ignoring all the tradeoffs is kind of cheating, there are more things to consider too as you mention.

I'm guessing we aren't quite over the threshold because it is still very young all things considered, although the ecosystem is already pretty big. I feel like generally things tend to grow beyond their usefulness initially, and we're at that stage right now, and people are shooting it all kind of directions to see what works or not.

>>diggan+ao
> Just the ability to speed up exploration and validation based on what a human tells it to do is already enormously useful, depending on how much you can speed up those things, and how accurate it can be.

The big question is: is it useful enough to justify the cost when the VC subsidies go away?

My phone recently offered me Gemini "now for free" and I thought "free for now, you mean. I better not get used to that. They should be required to call it a free trial."

>>dingnu+hr
Inference is actually quite cheap. Like, a highly competitive LLM can cost 1/25th of a search query. And it is not due to inference being subsidized by VC money.

It's also getting cheaper all the time. Something like 1000x cheaper in the last two years at the same quality level, and there's not yet any sign of a plateau.

So it'd be quite surprising if the only long-term business model turned out to be subscriptions.

>>jsnell+Wx
Can you link to any sources that support your claim?

>>Denzel+0D
Sure. Here's something I'd written on the subject that I'd left lying in my drafts folder for a month, but I've now published just for you :)

https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch...

It has links to public sources on the pricing of both LLMs and search, and explains why the low inference prices can't be due the inference being subsidized. (And while there are other possible explanations, it includes a calculator for what the compound impact of all of those possible explanations could be.)

>>jsnell+fI
Thanks for sharing!

It's worthwhile to note that https://github.com/deepseek-ai/open-infra-index/blob/main/20... shows cost vs. theoretical income. They don't show 80% gross margins and there's probably a reason they don't share their actual gross margin.

OpenAI is the easiest counterexample that proves inference is subsidized right now. They've taken $50B in investment; surpassed 400M WAUs (https://www.reuters.com/technology/artificial-intelligence/o...); lost $5B on $4B in revenue for 2024 (https://finance.yahoo.com/news/openai-thinks-revenue-more-tr...); and project they won't be cash-flow positive until 2029.

Prices would be significantly higher if OpenAI was priced for unit profitability right now.

As for the mega-conglomerates (Google, Meta, Microsoft), GenAI is a loss leader to build platform power. GenAI doesn't need to be unit profitable, it just needs to attract and retain people on their platform, ie you need a Google Cloud account to use Gemini API.

>>Denzel+H71
Thanks,

I believe the API prices are not subsidized, and there's an entire section devoted to that. To recap:

1) pure compute providers (rather than companies providing both the model and the compute) can't really gain anything from subsidizing. That market is already commoditized and supply-limited.

2) there is no value to gaining paid API market share -- the market share isn't sticky, and there's no benefit to just getting more usage since the terms of service for all the serious providers promise that the data won't be used for training.

3) we have data from a frontier lab on what the economics of their paid API inference are (but not the economics of other types of usage)

So the API prices set a ceiling on what the actual cost of inference can be. And that ceiling is very low relative to the prices of a comparable (but not identical) non-AI product category.

That's a very distinct case from free APIs and consumer products. The former is being given out for no cost in exchange for data, the latter for data and sticky market share. So unlike paid APIs, the incentives are there.

But given the cost structure of paid APIs, we can tell that it would be trivial for the consumer products to be profitably monetized with ads. They've got a ton of users, and the way users interact with their main product would be almost perfect for advertising.

The reason OpenAI is not making a profit isn't that inference is expensive. It's that they're choosing not to monetize like 95% of their users, despite the unit economics being very lucrative in principle. They're making a loss because for now they can, and for now the only goal of their consumer business is to maximize their growth and consumer mindshare.

If OpenAI needed to make a profit, they would not raise their prices on things being paid for. They'd just need to extract a very modest revenue from their unpaid users. (It's 500M unpaid users. To make $5B/year in revenue from them, you'd need just a $1 ARPU. That's an order of magnitude below what's realistic. Hell, that's lower than the famously hard to monetize Reddit's global ARPU.)

>>jsnell+Cc1
Yes, I read your entire article and that section, hence my response. :)

1) Help me understand what you mean by “pure compute providers” here. Who are the pure compute providers and what are their financials including pricing?

2) I already responded to this - platform power is one compelling value gained from paid API market share.

3) If the frontier lab you’re talking about is DeepSeek, I’ve already responded to this as well, and you didn’t even concede the point that the 80% margin you cited is inaccurate given that it’s based on a “theoretical income”.

>>Denzel+su1
1) Any companies that host APIs using open-weights models (LLama, Gemma, Deepseek, etc) in exchange for money. There's a lot of them around, at different scales and different parts of a hosting provider's lifecycle. Check for example the Openrouter page for any open-weights model for hosters of that model with price data.

2) (API) platform power having no value in this space has been demonstrated repeatedly. There are no network effects, because you can't use the user data to improve models. There is no lock-in, as the models are easy to substitute due to how incredibly generic the interface is. There is no loyalty, the users will jump ship instantly when better models are released. There is no purchasing power from having more scale, the primary supplier (Nvidia) isn't giving volume discounts and is actually giving preferential allocations to smaller hosting providers to fragment the market as much as possible.

Did you have some other form of platform power in mind?

3) I did not concede that point because I don't think it's relevant. They provide the exact data for their R1 inference economics:

- The cost per node: a 8*H800 node costs $16/hour=$0.0045/s to run (rental price, so that covers capex + opex).

- The throughput per node: Given their traffic mix, a single node will process 75k/s input tokens and generate 15k/s output tokens.

- Pricing ($0.35/1M input when weighing for cache hit/miss, $2.2/1M output)

- From which it follows that the per-node revenue is $0.35/(1M/75k/s) = $0.026/s for input, and $2.2/(1M/15k/s)=$0.033/s for output. That's $0.06/s in revenue, substantially higher than the cost of revenue.

Like, that just is what the economics of paid R1 inference are (there being V3 in the mix doesn't matter, they're the same parameter count). Inference is really, really cheap both in absolute cost/token terms and relative to the prices people are willing to pay.

Their aggregate margins are different, and we don't know how different, because here too they choose to also provide free service with no ads. But that too is a choice. If they just stopped doing that and rented fewer GPUs, their margins would be very lucrative. (Not as high as the computation suggests since the unpaid traffic allows them to batch more efficiently, but hat's not going to make a 5x difference.)

But fair enough, it might be cleaner to use the straight cost per token data rather than add the indirection of margins. Either way, it seems clear that API pricing is not subsidized.

zlacker