Introducing Superalignment

>>tim_sw+(OP)
You have to give them credit for putting their money where their mouth is here.

But it's also easy to parody this. I am just imagining Ilya and Jan coming out on stage wearing red capes.

I think George Hotz made sense when he pointed out that the best defense will be having the technology available to everyone rather than a small group. We can at least try to create a collective "digital immune system" against unaligned agents with our own majority of aligned agents.

But I also believe that there isn't any really effective mitigation against superintelligence superseding human decision making aside from just not deploying it. And it doesn't need to be alive or anything to be dangerous. All you need is for a large amount of decision-making for critical systems to be given over to hyperspeed AI and that creates a brittle situation where things like computer viruses can be existential risks. It's something similar to the danger of nuclear weapons.

Even if you just make GPT-4 say 33% smarter and 50 or 100 times faster and more efficient, that can lead to control of industrial and military assets being handed over to these AI agents. Because the agents are so much faster, humans cannot possibly compete, and if you interrupt them to try to give them new instructions then your competitor's AIs race ahead the equivalent of days or weeks of work. This, again, is a precarious situation to be in.

There is huge promise and benefit from making the systems faster, smarter, and more efficient, but in the next few years we may be walking a fine line. We should agree to place some limitation on the performance level of AI hardware that we will design and manufacture.

>>ilaksh+nk
The recent paper about using gpt-4 to give more insight into its actual internals was interesting, but yeah the risks seem really high at the moment that we'd accidentally develop unaligned AGI before figuring out alignment.

Out of the options to reduce that risk I think it would really take something like this, which also seems extremely unlikely to actually happen given the coordination problem: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-no...

You talk about aligned agents - but there aren't any today and we don't know how to make them. It wouldn't be aligned agents vs. unaligned, it's only unaligned.

I don't think spreading out the tech reduces the risk. Spreading out nuclear weapons doesn't reduce the risk (and with nukes at least it's a lot easier to control the fissionable materials). Even with nukes you can still create them and decide not to use them, not so true with superintelligent AGI.

If anyone could have made nukes from their computer humanity may not have made it.

I'm glad OpenAI understands the severity of the problem though and is at least trying to solve it in time.

>>goneho+qm
Unaligned doesn't really seem like it should be a threat. If it's unaligned it can't work toward any goal. The danger is that it aligns with some anti-goal. If you've got a bunch of agents all working unaligned, they will work at cross-purposes and won't be able to out-think us.

>>lukesc+d21
This is a misunderstanding of what AI alignment problems are all about.

Alignment != capability

Think a paperclip maximizing robot that in its process of creating paperclips kills everyone on earth to turn them into paperclips.

>>jdasdf+x41
Corporations like Saudi Aramco are already doing that. You don't need a superintelligent AI, corporations that maximize profit are already sufficient as misaligned superhuman agents.

>>climat+PC1
You can't maximize profit without customers, they must be aligned with someone.

>>nights+4R1
In fairness, corporations can still be fraudulent.

zlacker