zlacker

> I'm not sure the cat was ever in the bag for LLMs.

I think timelines are important here; for example in 2015 there was no such thing as Transformers, and while there were AGI x-risk folks (e.g. MIRI) they were generally considered to be quite kooky. I think AGI was very credibly "cat in the bag" at this time; it doesn't happen without 1000s of man-years of focused R&D that only a few companies can even move the frontier on.

I don't think the claim should be "we could have prevented LLMs from ever being invented", just that we can perhaps delay it long enough to be safe(r). To bring it back to the original thread, Sam Altman's explicit position is that in the matrix of "slow vs fast takeoff" vs. "starting sooner vs. later", a slow takeoff starting sooner is the safest choice. The reasoning being, you would prefer a slow takeoff starting later, but the thing that is most likely to kill everyone is a fast takeoff, and if you try for a slow takeoff later, you might end up with a capability overhang and accidentally get a fast takeoff later. As we can see, it takes society (and government) years to catch up to what is going on, so we don't want anything to happen quicker than we can react to.

A great example of this overhang dynamic would be Transformers circa 2018 -- Google was working on LLMs internally, but didn't know how to use them to their full capability. With GPT (and particularly after Stable Diffusion and LLaMA) we saw a massive explosion in capability-per-compute for AI as the broader community optimized both prompting techniques (e.g. "think step by step", Chain of Thought) and underlying algorithmic/architectural approaches.

At this time it seems to me that widely releasing LLMs has both i) caused a big capability overhang to be harvested, preventing it from contributing to a fast takeoff later, and ii) caused OOMs more resources to be invested in pushing the capability frontier, making the takeoff trajectory overall faster. Both of those likely would not have happened for at least a couple years if OpenAI didn't release ChatGPT when they did. It's hard for me to calculate whether on net this brings dangerous capability levels closer, but I think there's a good argument that it makes the timeline much more predictable (we're now capped by global GPU production), and therefore reduces tail-risk of the "accidental unaligned AGI in Google's datacenter that can grab lots more compute from other datacenters" type of scenario (aka "foom").

> LLMs are clearly not currently an "existential threat"

Nobody is claiming (at least, nobody credible in the x-risk community is claiming) that GPT-4 is an existential threat. The claim is, looking at the trajectory, and predicting where we'll be in 5-10 years; GPT-10 could be very scary, so we should make sure we're prepared for it -- and slow down now if we think we don't have time to build GPT-10 safely on our current trajectory. Every exponential curve flattens into an S-curve eventually, but I don't see a particular reason to posit that this one will be exhausted before human-level intelligence, quite the opposite. And if we don't solve fundamental problems like prompt-hijacking and figure out how to actually durably convey our values to an AI, it could be very bad news when we eventually build a system that is smarter than us.

While Eliezer Yudkowsky takes the maximally-pessimistic stance that AGI is by default ruinous unless we solve alignment, there are plenty of people who take a more epistemically humble position that we simply cannot know how it'll go. I view it as a coin toss as to whether an AGI directly descended from ChatGPT would stay aligned to our interests. Some view it as Russian roulette. But the point being, would you play Russian roulette with all of humanity? Or wait until you can be sure the risk is lower?

I think it's plausible that with a bit more research we can crack Mechanistic Interpretability and get to a point where, for example, we can quantify to what extent an AI is deceiving us (ChatGPT already does this in some situations), and to what extent it is actually using reasoning that maps to our values, vs. alien logic that does not preserve things humanity cares about when you give it power.

> nuclear weapon control by limiting information has already failed.

In some sense yes, but also, note that for almost 80 years we have prevented _most_ countries from learning this tech. Russia developed it on their own, and some countries were granted tech transfers or used espionage. But for the rest of the world, the cat is still in the bag. I think you can make a good analogy here: if there is an arms race, then superpowers will build the technology to maintain their balance of power. If everybody agrees not to build it, then perhaps there won't be a race. (I'm extremely pessimistic for this level of coordination though.)

Even with the dramatic geopolitical power granted by possessing nuclear weapons, we have managed to pursue a "security through obscurity" regime, and it has worked to prevent further spread of nuclear weapons. This is why I find the software-centric "security by obscurity never works" stance to be myopic. It is usually true in the software security domain, but it's not some universal law.