OpenAI departures: Why can’t former employees talk?

>>fnbr+(OP)
Extra respect is due to Jan Leike, then:

https://x.com/janleike/status/1791498174659715494

>>thorum+Bu
I think superalignment is absurd, and model "safety" is the modern AI company's "think of the children" pearl clutching pretext to justify digging moats. All this after sucking up everyone's copyright material as fair use, then not releasing the result, and profiting off it.

All due respect to Jan here, though. He's being (perhaps dangerously) honest, genuinely believes in AI safety, and is an actual research expert, unlike me.

>>a_wild+Xv
The superalignment team was not focused on that kind of “safety” AFAIK. According to the blog post announcing the team,

https://openai.com/index/introducing-superalignment/

> Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

> While superintelligence seems far off now, we believe it could arrive this decade.

> Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment:

> How do we ensure AI systems much smarter than humans follow human intent?

> Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

>>thorum+My
That doesn't really contradict what the other poster said. They're calling for regulation (digging a moat) to ensure systems are "safe" and "aligned" while ignoring that humans are not aligned, so these systems obviously cannot be aligned with humans; they can only be aligned with their owners (i.e. them, not you).

>>ndrisc+XA
Alignment in the realm of AGI is not about getting everyone to agree. It's about whether or not the AGI is aligned to the goal you've given it. The paperclip AGI example is often used, you tell the AGI "Optimize the production of paperclips" and the AGI started blending people to extract iron from their blood to produce more paperclips.

Humans are used to ordering around other humans who would bring common sense and laziness to the table and probably not grind up humans to produce a few more paperclips.

Alignment is about getting the AGI to be aligned with the owners, ignoring it means potentially putting more and more power into the hands of a box that you aren't quite sure is going to do the thing you want it to do. Alignment in the context of AGIs was always about ensuring the owners could control the AGIs not that the AGIs could solve philosophy and get all of humanity to agree.

>>ihuman+cC
Right and that's why it's a farce.

> Whoa whoa whoa, we can't let just anyone run these models. Only large corporations who will use them to addict children to their phones and give them eating disorders and suicidal ideation, while radicalizing adults and tearing apart society using the vast profiles they've collected on everyone through their global panopticon, all in the name of making people unhappy so that it's easier to sell them more crap they don't need (a goal which is itself a problem in the face of an impending climate crisis). After all, we wouldn't want it to end up harming humanity by using its superior capabilities to manipulate humans into doing things for it to optimize for goals that no one wants!

>>ndrisc+cD
A corporate dystopia is still better than extinction. (Assuming the latter is a reasonable fear)

>>concor+GW
Neither is acceptable

zlacker