Introducing Superalignment

>>tim_sw+(OP)
You have to give them credit for putting their money where their mouth is here.

But it's also easy to parody this. I am just imagining Ilya and Jan coming out on stage wearing red capes.

I think George Hotz made sense when he pointed out that the best defense will be having the technology available to everyone rather than a small group. We can at least try to create a collective "digital immune system" against unaligned agents with our own majority of aligned agents.

But I also believe that there isn't any really effective mitigation against superintelligence superseding human decision making aside from just not deploying it. And it doesn't need to be alive or anything to be dangerous. All you need is for a large amount of decision-making for critical systems to be given over to hyperspeed AI and that creates a brittle situation where things like computer viruses can be existential risks. It's something similar to the danger of nuclear weapons.

Even if you just make GPT-4 say 33% smarter and 50 or 100 times faster and more efficient, that can lead to control of industrial and military assets being handed over to these AI agents. Because the agents are so much faster, humans cannot possibly compete, and if you interrupt them to try to give them new instructions then your competitor's AIs race ahead the equivalent of days or weeks of work. This, again, is a precarious situation to be in.

There is huge promise and benefit from making the systems faster, smarter, and more efficient, but in the next few years we may be walking a fine line. We should agree to place some limitation on the performance level of AI hardware that we will design and manufacture.

>>ilaksh+nk
The recent paper about using gpt-4 to give more insight into its actual internals was interesting, but yeah the risks seem really high at the moment that we'd accidentally develop unaligned AGI before figuring out alignment.

Out of the options to reduce that risk I think it would really take something like this, which also seems extremely unlikely to actually happen given the coordination problem: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-no...

You talk about aligned agents - but there aren't any today and we don't know how to make them. It wouldn't be aligned agents vs. unaligned, it's only unaligned.

I don't think spreading out the tech reduces the risk. Spreading out nuclear weapons doesn't reduce the risk (and with nukes at least it's a lot easier to control the fissionable materials). Even with nukes you can still create them and decide not to use them, not so true with superintelligent AGI.

If anyone could have made nukes from their computer humanity may not have made it.

I'm glad OpenAI understands the severity of the problem though and is at least trying to solve it in time.

>>goneho+qm
Unaligned doesn't really seem like it should be a threat. If it's unaligned it can't work toward any goal. The danger is that it aligns with some anti-goal. If you've got a bunch of agents all working unaligned, they will work at cross-purposes and won't be able to out-think us.

>>lukesc+d21
This is a misunderstanding of what AI alignment problems are all about.

Alignment != capability

Think a paperclip maximizing robot that in its process of creating paperclips kills everyone on earth to turn them into paperclips.

>>jdasdf+x41
No, I understand what you're saying, I just think you're wrong. To be a little clearer: you're assuming a single near-omnipotent agent randomly selects an anti-goal and is capable of achieving it. If we instead create 100 near-omnipotent agents odds are that the majority will be smart enough to recognize that they have to cooperate to achieve any goals at all. Even if the majority have selected anti-goals, it's likely that the majority of the anti-goals will be at cross-purposes. You'll also have a paperclip minimizer, for example. Now, the minimizers are a little scary but these are thought experiments and the goals will not be so simple (nor do I think it would be obvious to anyone including the AIs which ones have selected which goals.) The AIs will have to be liars if they select anti-goals, and they will have to not only lie to us but lie to each other, which makes coordination very hard bordering on impossible.

In some ways this is a lot like Bitcoin, in that people think that with enough math and science expertise you can just reason your way out of social problems. And you can, to an extent, but not if you're fighting an organized social adversary that is collectively smarter than you. 7 billion humans is a superintelligence and it's a high bar to be smarter than that.

>>lukesc+1R2
It’s worth reading about the orthogonality thesis and the underlying arguments about it.

It’s not an anti-goal that’s intentionally set, it’s that complex goal setting is hard and you may end up with something dumb that maximizes the reward unintentionally.

The issue is all of the AGIs will be unaligned in different ways because we don’t know how to align any of them. Also, the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

There’s a lot of thoughtful writing that exists on this topic and it’s really worth digging into the state of the art about it, your replies are thoughtful so it sounds like something you’d think about. I did the same thing a few years ago (around 2015) and found the arguments persuasive.

This is a decent overview: https://www.samharris.org/podcasts/making-sense-episodes/116...

>>goneho+GV2
> the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

Thanks for reminding me that I need to properly write up why I don't think self-improvement is a huge issue.

(My thought won't fit into a comment, and I'll want to link to it later).

zlacker