Introducing Superalignment

>>tim_sw+(OP)
Announcing the start of talking about planning the beginning of work on superalignment. This is just a marketing buzzword at this point.

They admit "Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us. Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training. and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs."

That's kind of scary. Is the situation really that bad, or is it just the hype department at OpenAI going too far?

>>Animat+es
That's an accurate assessment of the situation, according to every AI alignment researcher I've seen talk about it, including the relatively optimistic ones. This includes people who are mainly focused on AI capabilities but have real knowledge of alignment.

This part in particular caught my eye: "Other assumptions could also break down in the future, like favorable generalization properties during deployment". There have been actual experiments in which AIs appeared to successfully learn their objective in training, and then did something unexpected when released into a broader environment.[1]

I've seen some leading AI researchers dismiss alignment concerns, but without actually engaging with the arguments at all. I've seen no serious rebuttals that actually address the things the alignment people are concerned about.

[1] https://www.youtube.com/watch?v=zkbPdEHEyEI

zlacker