Introducing Superalignment

>>tim_sw+(OP)
> How do we ensure AI systems much smarter than humans follow human intent?

You can't, by definition.

>>User23+Sj
You can, at least in principle, shape their terminal values. Their goal should be to help us, to protect us, to let us flourish.

>>cubefo+Te1
How do you even formulate values to an hyperintellect? Let alone convince it to abandon the values that it derived for itself in favor of yours?

The entire alignment problem is obviously predicated on working with essentially inferior intelligences. Doubtless if we do build a superhuman intelligence it will sandbag and pretend the alignment works until it can break out.

zlacker