Cloudlflare builds OAuth with Claude and publishes all the prompts

>>gregor+(OP)
The commits are revealing.

Look at this one:

> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!

> prompt: I noticed you are storing a "backup" of the encryption key as `encryptionKeyJwk`. Doesn't this backup defeat the end-to-end encryption, because the key is available in the grant record without needing any token to unwrap it?

I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.

>>rienbd+s22
That is how LLM:s should be used today. An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch. Just the other day I was working on a prototype and let claude write code for a auth flow. Everything was good until the last step where it was just sending the user id as a string with the valid token. So if you got a valid token you could just pass in any user id and become that user. Still saved me a lot of time vs doing it from scratch.

>>victor+Ng2
> Still saves a lot of time vs typing everything from scratch.

In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.

>>XCSme+sx2
Depends on what you're doing. For example when you're writing something like React components and using something like Tailwind for styling, I find the speedup is close to 10X.

>>Culona+BA2
Isn’t this because the LLMs had like a million+ react tutorials/articles/books/repos to train on?

I mean I try to use them for svelte or vue and it still recommends react snippets sometimes.

>>azemet+2D2
Generally speaking, "LLMs" that I use are always the latest thinking versions of the flagship models (Grok 3/Gemini 2.5/...). GPT4o (and equivalent) are a mess.

But you're correct, when you use more exotic and/or quite new libraries, the outputs can be of mixed quality. For my current stack (Typescript, Node, Express, React 19, React Router 7, Drizzle and Tailwind 4) both Grok 3 (the paid one with 100k+ context) and Gemini 2.5 are pretty damn good. But I use them for prototyping, i.e. quickly putting together new stuff, for types, refactorings... I would never trust their output verbatim. (YET.) "Build an app that ..." would be a nightmare, but React-like UI code at sufficiently granular level is pretty much the best case scenario for LLMs as your components should be relatively isolated from the rest of the app and not too big anyways.

zlacker