OpenAI departures: Why can’t former employees talk?

>>fnbr+(OP)
Extra respect is due to Jan Leike, then:

https://x.com/janleike/status/1791498174659715494

>>thorum+Bu
I think superalignment is absurd, and model "safety" is the modern AI company's "think of the children" pearl clutching pretext to justify digging moats. All this after sucking up everyone's copyright material as fair use, then not releasing the result, and profiting off it.

All due respect to Jan here, though. He's being (perhaps dangerously) honest, genuinely believes in AI safety, and is an actual research expert, unlike me.

>>a_wild+Xv
The superalignment team was not focused on that kind of “safety” AFAIK. According to the blog post announcing the team,

https://openai.com/index/introducing-superalignment/

> Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

> While superintelligence seems far off now, we believe it could arrive this decade.

> Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment:

> How do we ensure AI systems much smarter than humans follow human intent?

> Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

>>thorum+My
> Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

Superintelligence that can be always ensured to have the same values and ethics as current humans, is not a superintelligence or likely even a human level intelligence (I bet humans 100 years from now will see the world significantly different than we do now).

Superalignment is an oxymoron.

>>RcouF1+HB
You might be interested in how CEV, one framework proposed for superalignment, addresses that concern:

https://en.wikipedia.org/wiki/Friendly_artificial_intelligen...

> our coherent extrapolated volition is "our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted (…) The appeal to an objective through contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.

>>thorum+SF
Is there an insightful summary of this proposal? The whole paper looks like 38 pages of non-rigorous prose with no clear procedure and already “aligned” LLMs will likely fail to analyze it.

Forced myself through some parts of it and all I can get is people don’t know what they want so it would be nice to build an oracle. Yeah, I guess.

>>wruza+9V
It's not a proposal with a detailed implementation spec, it's a problem statement.

>>comp_t+mW
“One framework proposed for superalignment” sounded like it does something. Or maybe I missed the context.

zlacker