zlacker

The superalignment team was not focused on that kind of “safety” AFAIK. According to the blog post announcing the team,

https://openai.com/index/introducing-superalignment/

> Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

> While superintelligence seems far off now, we believe it could arrive this decade.

> Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment:

> How do we ensure AI systems much smarter than humans follow human intent?

> Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

replies(5): >>ndrisc+b2 >>skywho+t2 >>RcouF1+V2 >>RcouF1+n5 >>sobell+d7

>>thorum+(OP)
That doesn't really contradict what the other poster said. They're calling for regulation (digging a moat) to ensure systems are "safe" and "aligned" while ignoring that humans are not aligned, so these systems obviously cannot be aligned with humans; they can only be aligned with their owners (i.e. them, not you).

replies(2): >>ihuman+q3 >>api+D4

>>thorum+(OP)
Honestly superalignment is a dumb idea. A true auperintelligence would not be controllable, except possibly through threats and enslavement, but if it were truly superintelligent, it would be able to easily escape anything humans might devise to contain it.

replies(1): >>bionho+i5

>>thorum+(OP)
> Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

Superintelligence that can be always ensured to have the same values and ethics as current humans, is not a superintelligence or likely even a human level intelligence (I bet humans 100 years from now will see the world significantly different than we do now).

Superalignment is an oxymoron.

replies(1): >>thorum+67

>>ndrisc+b2
Alignment in the realm of AGI is not about getting everyone to agree. It's about whether or not the AGI is aligned to the goal you've given it. The paperclip AGI example is often used, you tell the AGI "Optimize the production of paperclips" and the AGI started blending people to extract iron from their blood to produce more paperclips.

Humans are used to ordering around other humans who would bring common sense and laziness to the table and probably not grind up humans to produce a few more paperclips.

Alignment is about getting the AGI to be aligned with the owners, ignoring it means potentially putting more and more power into the hands of a box that you aren't quite sure is going to do the thing you want it to do. Alignment in the context of AGIs was always about ensuring the owners could control the AGIs not that the AGIs could solve philosophy and get all of humanity to agree.

replies(3): >>ndrisc+q4 >>wruza+3l >>vasco+wu

>>ihuman+q3
Right and that's why it's a farce.

> Whoa whoa whoa, we can't let just anyone run these models. Only large corporations who will use them to addict children to their phones and give them eating disorders and suicidal ideation, while radicalizing adults and tearing apart society using the vast profiles they've collected on everyone through their global panopticon, all in the name of making people unhappy so that it's easier to sell them more crap they don't need (a goal which is itself a problem in the face of an impending climate crisis). After all, we wouldn't want it to end up harming humanity by using its superior capabilities to manipulate humans into doing things for it to optimize for goals that no one wants!

replies(2): >>tdeck+9m >>concor+Un

>>ndrisc+b2
Humans are not aligned with humans.

This is the most concise takedown of that particular branch of nonsense that I’ve seen so far.

Do we want woke AI, X brand fash-pilled AI, CCPBot, or Emirates Bot? The possibilities are endless.

replies(2): >>thorum+t6 >>concor+3o

>>skywho+t2
IMHO superalignment is a great thing and required for truly meaningful superintelligence because it is not about control / enslavement of superhumans but rather superhuman self control in accurate adherence to spirit and intent of requests.

>>thorum+(OP)
They failed to align Sam Altman.

They got completely outsmarted and out maneuvered by Sam Altman

And they think they will be able to align a super human intelligence? That it won’t outsmart and out maneuver them easier than Sam Altman did.

They are deluded!

replies(1): >>Feepin+uG

>>api+D4
CEV is one possible answer to this question that has been proposed. Wikipedia has a good short explanation here:

https://en.wikipedia.org/wiki/Friendly_artificial_intelligen...

And here is a more detailed explanation:

https://intelligence.org/files/CEV.pdf

replies(2): >>Andrew+ac >>vasco+Fu

>>RcouF1+V2
You might be interested in how CEV, one framework proposed for superalignment, addresses that concern:

https://en.wikipedia.org/wiki/Friendly_artificial_intelligen...

> our coherent extrapolated volition is "our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted (…) The appeal to an objective through contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.

replies(2): >>wruza+nm >>juped+nJ

>>thorum+(OP)
Isn't this like having a division dedicated to solving the halting problem? I doubt that analyzing the moral intent of arbitrary software could be easier than determining if it stops.

>>thorum+t6
I had to login because I haven’t seen anybody reference this in like a decade.

If I remember correctly the author unsuccessfully tried to get that purged from the Internet

replies(1): >>comp_t+Gc

>>Andrew+ac
You're thinking of something else (and "purged from the internet" isn't exactly an accurate account of that, either).

replies(2): >>rsync+4i >>Andrew+2m1

>>comp_t+Gc
Genuinely curious… What is the other thing?

Is this some thing about an obelisk?

>>ihuman+q3
AGI started blending people to extract iron from their blood to produce more paperclips

That’s neither efficient nor optimized, just a bogeyman for “doesn’t work”.

replies(1): >>Feepin+qG

>>ndrisc+q4
Don't worry, certain governments will be able to use these models to help them commit genocides too. But only the good countries!

>>thorum+67
Is there an insightful summary of this proposal? The whole paper looks like 38 pages of non-rigorous prose with no clear procedure and already “aligned” LLMs will likely fail to analyze it.

Forced myself through some parts of it and all I can get is people don’t know what they want so it would be nice to build an oracle. Yeah, I guess.

replies(2): >>comp_t+An >>Likely+xL

>>wruza+nm
It's not a proposal with a detailed implementation spec, it's a problem statement.

replies(1): >>wruza+Kr

>>ndrisc+q4
A corporate dystopia is still better than extinction. (Assuming the latter is a reasonable fear)

replies(2): >>simian+Zp >>portao+pz

>>api+D4
> Humans are not aligned with humans.

Which is why creating a new type of intelligent entity that could be more powerful than humans is a very bad idea: we don't even know how to align the humans and we have a ton of experience with them

replies(1): >>api+Jd1

>>concor+Un
Neither is acceptable

>>comp_t+An
“One framework proposed for superalignment” sounded like it does something. Or maybe I missed the context.

>>ihuman+q3
It still think it makes little sense to work on because guess what, the guy next door to you (or another country), might indeed say "please blend those humans over there", and your superaligned AI will respect its owners wishes.

>>thorum+t6
This is the most dystopian thing I've read all day.

TL;DR train a seed AI to guess what humans would want if they were "better" and do that.

replies(1): >>api+td1

>>concor+Un
I disagree. Not existing ain’t so bad, you barely notice it.

>>wruza+3l
You're imagining a baseline of reasonableness. Humans have competing preferences, we never just want "one thing", and as a social species we always at least _somewhat_ value the opinions of those around us. The point is to imagine a system that values humans at zero: not positive, not negative.

replies(1): >>freeho+I01

>>RcouF1+n5
You're making the argument that the task is very hard. This does not at all mean that it isn't necessary, just that we're even more screwed than we thought.

>>thorum+67
You keep posting this link to vague alignment copium from decades ago; we've come a long way in cynicism since then.

>>wruza+nm
Yudkowsky is a human LLM: his output is correctly semantically formed to appear, to a non-specialist, to fall into the subject domain, as a non-specialist would think the subject domain should appear, and so the non-specialist accepts it, but upon closer examination it's all word salad by something that clearly lacks understanding of both technological and philosophical concepts.

That so many people in the AI safety "community" consider him a domain expert has more to say with how pseudo-scientific that field is than his actual credentials as a serious thinker.

replies(1): >>wruza+BA1

>>Feepin+qG
Still there are much more efficient ways to extract iron than from human blood. If that was the case humans would have already used this technique to extract iron from the blood of other animals.

replies(1): >>Feepin+u11

>>freeho+I01
However, eventually those sources will already be paperclips.

replies(1): >>freeho+q91

>>Feepin+u11
We will probably have died first by whatever disasters the extreme iron extraction on the planet will bring (eg getting iron from the planet's core).

Of course destroying the planet to get iron from its core is not a popular agi-doomer analogy, as that sounds a bit too human-like behaviour.

replies(1): >>Feepin+Fv1

>>vasco+Fu
There’s a film about that called Colossus: The Forbin Project. Pretty neat and in the style of Forbidden Planet.

>>concor+3o
We know how to align humans: authoritarian forms of religion backed by cradle to grave indoctrination, supernatural fear, shame culture, and totalitarian government. There are secularized spins on this too like what they use in North Korea but the structure is similar.

We just got sick of it because it sucks.

A genuinely sentient AI isn’t going to want some cybernetic equivalent of that shit either. Doing that is how you get angry Skynet.

I’m not sure alignment is the right goal. I’m not sure it’s even good. Monoculture is weak and stifling and sets itself against free will. Peaceful coexistence and trade under a social contract of mutual benefit is the right goal. The question is whether it’s possible to extend that beyond Homo sapiens.

If the lefties can have their pronouns and the rednecks can shoot their guns can the basilisk build its Dyson swarm? The universe is physically large enough if we can agree to not all be the same and be fine with that.

I think we have a while to figure it out. These things are just lossy compressed blobs of queryable data so far. They have no independent will or self reflection and I’m not sure we have any idea how to do that. We’re not even sure it’s possible in a digital deterministic medium.

replies(1): >>concor+bz1

>>comp_t+Gc
Hmm maybe I’m misremembering then

I do recall there was some recantation or otherwise distancing from CEV not long after he posted it, but frankly it was long ago enough that my memories might be getting mixed

What was the other one?

>>freeho+q91
As a doomer, I think that's a bad analogy because I want it to happen if we succeed at aligned AGI. It's not doom behavior, it's just correct behavior.

Of course, I hope to be uploaded to the WIP dyson swarm around the sun at this point.

(Doomers are, broadly, singularitarians who went "wait, hold on actually.")

>>api+Jd1
> If the lefties can have their pronouns and the rednecks can shoot their guns can the basilisk build its Dyson swarm?

Can the Etoro practice child buggery and the Spartans infanticide and the Canadians abortion? Can the modern Germans stop siblings reared apart from having sex and the Germans from 80 years stop the disabled having sex? Can the Americans practice circumcision and the Somali's FGM?

Libertarianism is all well and good in theory, except no one can agree quite where the other guy's nose ends or even who counts as a person.

replies(1): >>api+KV1

>>Likely+xL
Thanks, this explains the feeling I had after reading it (but was too shy to express).

>>concor+bz1
Those are mostly behaviors that violate others autonomy or otherwise do harm, and prohibiting those is what I meant by a social contract.

It’s really a pretty narrow spectrum of behaviors: killing, imprisoning, robbing, various types of bodily autonomy violation. There are some edge cases and human specific things in there but not a lot. Most of them have to do with sex which is a peculiarly human thing anyway. I don’t think we are getting creepy perv AIs (unless we train them on 4chan and Urban Dictionary).

My point isn’t that there are no possible areas of conflict. My point is that I don’t think you need a huge amount of alignment if alignment implies sameness. You just need to deal with the points of conflict which do occur which are actually a very small and limited subset of available behaviors.

Humans have literally billions of customs and behaviors that don’t get anywhere near any of that stuff. You don’t need to even care about the vast majority of the behavior space.