Introducing Superalignment

>>tim_sw+(OP)
Announcing the start of talking about planning the beginning of work on superalignment. This is just a marketing buzzword at this point.

They admit "Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us. Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training. and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs."

That's kind of scary. Is the situation really that bad, or is it just the hype department at OpenAI going too far?

>>Animat+es
That's an accurate assessment of the situation, according to every AI alignment researcher I've seen talk about it, including the relatively optimistic ones. This includes people who are mainly focused on AI capabilities but have real knowledge of alignment.

This part in particular caught my eye: "Other assumptions could also break down in the future, like favorable generalization properties during deployment". There have been actual experiments in which AIs appeared to successfully learn their objective in training, and then did something unexpected when released into a broader environment.[1]

I've seen some leading AI researchers dismiss alignment concerns, but without actually engaging with the arguments at all. I've seen no serious rebuttals that actually address the things the alignment people are concerned about.

[1] https://www.youtube.com/watch?v=zkbPdEHEyEI

>>Dennis+My
Inventing an entire pseudoscientific field and then being mad no one wants to engage your arguments is ultimate "debate me" poster behavior.

>>moreli+cA
Lots of leading AI researchers actually are taking it seriously, including of course OpenAI, and recently Geoffrey Hinton who basically invented deep learning.

>>Dennis+YC
Okay but as far as I know Geoffrey Hinton isn't an "A.I. Alignment Researcher." He was fairly dismissive about the risks of AI in his March 2023 interview and changed his mind by May 2023. I'm not sure that says much about the A.I. Alignment Researcher field.

>>anothe+nI
The commenter above assumed that nobody besides alignment researchers are convinced by their arguments. Now you're complaining that a leading AI researcher who's convinced is not an alignment researcher. I guess I'll give up on this subthread.

>>Dennis+pN
The AI optimists are just impossible to reason with.

When it comes to people…

Expert who’s worried: conflict of interest or a quack

Non-expert: dismissible because non-expert

Was always worried: paranoiac

Recently became worried: flip flopper with no conviction

When it comes to the tech itself…

Bullish case: AI is super powerful and will change the world for the better

Bearish case: AI can’t do much lol what are you worried about they’re just words on a screen

zlacker