Cloudlflare builds OAuth with Claude and publishes all the prompts

>>gregor+(OP)
On the one hand, I would expect LLMs to be able to crank out such code when prompted by skilled engineers who also understand prompting these tools correctly. OAuth isn’t new, has tons of working examples to steal as training data from public projects, and in a variety of existing languages to suit most use cases or needs.

On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context. That has been bandied about for years, with no evidence to show for it beyond specifically cherry-picked examples, often from highly-controlled environments.

I never doubted that, with competent engineers, these tools could be used to generate “new” code from past datasets. What I continue to doubt is the utility of these tools given their immense costs, both environmentally and socially.

>>stego-+6b
I like to make a rough analogy with autonomous vehicles. There's a leveling system from 1 (old school cruise control) to 5 (full automation):

* We achieved Level 2 autonomy first, which requires you to fully supervise and retain control of the vehicle and expect mistakes at any moment. So kind of neat but also can get you in big trouble if you don't supervise properly. Some people like it, some people don't see it as a net gain given the oversight required.

^ This is where Tesla "FSD beta" is at, and probably where LLM codegen tools are at today.

* After many years we have achieved a degree of Level 4 autonomy on well-trained routes albeit with occasional human intervention. This is where Waymo is at in certain cities. Level 4 means autonomy within specific but broad circumstances like a given area and weather conditions. While it is still somewhat early days it looks like we can generally trust these to operate safely and ask for help when they are not confident. Humans are not out of the loop.[1]

^ This is probably what where we can expect codegen to grow after many more years of training and refinement in specific domains. I.e. a lot of what CloudFlare engineers did with their prompt engineering tweaking was of this nature. Think of them as the employees driving the training vehicles around San Francisco for the past decade. And similarly, "L4 codegen" needs to prioritize code safety which in part means ensuring humans can understand situations and step in to guide and debug when the tool gets stuck.

* We are still nowhere close to Level 5 "drive anywhere and under any conditions a human can." And IMHO it's not clear we ever will based purely on the technology and methods that got us to L4. There are other brain mechanisms at work that need to be modeled.

[1] https://www.cnbc.com/2023/11/06/cruise-confirms-robotaxis-re...

zlacker