Introducing Superalignment

>>tim_sw+(OP)
From a layman's perspective when it comes to cutting edge AI, I can't help but be a bit turned off by some of the copy. It seems it goes out of its way to use purposefully exhuberant language as a way to make the risks seem even more significant, just so as an offshoot it implies that the technology being worked on is so advanced. I'm trying to understand why it rubs me particularly the wrong way here, when, frankly, it is just about the norm anywhere else? (see tesla with FSD, etc.)

>>Chicag+m9
The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously. People often have an initial knee-jerk negative reaction to it (for not crazy reasons, lots of stuff is often overhyped), but that doesn't make it wrong.

It's uncool to look like an alarmist nut, but sometimes there's no socially acceptable alarm and the risks are real: https://intelligence.org/2017/10/13/fire-alarm/

It's worth looking at the underlying arguments earnestly, you can with an initial skepticism but I was persuaded. Alignment is also been something MIRI and others have been worried about since as early as 2007 (maybe earlier?) so it's also a case of a called shot, not a recent reaction to hype/new LLM capability.

Others have also changed their mind when they looked, for example:

- https://twitter.com/repligate/status/1676507258954416128?s=2...

- Longer form: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...

For a longer podcast introduction to the ideas: https://www.samharris.org/podcasts/making-sense-episodes/116...

>>goneho+gf
> The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously.

No. It’s not taken seriously because it’s fundamentally unserious. It’s religion. Sometime in the near future this all powerful being will kill us all by somehow grabbing all power over the physical world by being so clever to trick us until it is too late. This is literally the plot to a B-movie. Not only is there no evidence for this even existing in the near future, there’s no theoretical understanding how one would even do this, nor why someone would even hook it up to all these physical systems. I guess we’re supposed to just take it on faith that this Forbin Project is going to just spontaneously hack its way into every system without anyone noticing.

It’s bullshit. It’s pure bullshit funded and spread by the very people that do not want us to worry about real implications of real systems today. Care not about your racist algorithms! For someday soon, a giant squid robot will turn you into a giant inefficient battery in a VR world, or maybe just kill you and wear your flesh as to lure more humans to their violent deaths!

Anyone that takes this seriously, is the exact same type of rube that fell for apocalyptic cults for millennia.

>>jonath+tU
What you say is extremely unscientific. If you believe science and logic go hand in hand then:

A) We are developing AI right now and itnisngetting better

B) we do not know how exactly these things work because most of them are black boxer

C) we do not know if something goes wrong how to stop it.

The above 3 things are factual truth.

Now your only argument here could be that there is 0 risk whatsoever. This claim is totally unscientific because you are predicting 0 risk in an unknown system that is evolving.

It's religious yes. But vice versa. The Cult of venevolent AI god is religious not the other way around. There is some kind of inner mysterious working in people like you and Marc Andersen that pipularized these ideas but pmarca is clearly money biased here.

>>arisAl+HV
We do know the answer to C. Pull the plug, or plugs.

>>c_cran+PZ
Things we've either not successfully "pulled the plug" on despite the risks, and in some cases despite concerted military actions to attempt a plug-pull, and in other cases that it seems like it should only take willpower to achieve and yet somehow we still haven't: Carbon based fuels, cocaine, RBMK-class nuclear reactors, obesity, cigarettes.

Things we pulled the plug on eventually, while dragging it out, include: leaded fuel, asbestos, radium paint, treating above-ground atomic testing as a tourist attraction.

>>ben_w+jh1
We haven't pulled the plug on carbon fuels or old nuclear reactors because those things still work and provide benefits. An AI that is trying to kill us instead of doing its job isn't even providing any benefit. It's worse than useless.

>>c_cran+QU2
Do you think AI are unable to provide benefits while also being a risk, like coal and nuclear power? Conversely, what's the benefit of cocaine or cigarettes?

Even if it is only trying to kill us all and not provide any benefits — let's say it's been made by a literal death cult like Jonestown or Aum Shinrikyo — what's the smallest such AI that can do it, what's the hardware that needs, what's the energy cost? If it's an H100, that's priced in the realm of a cult, and sufficiently low power consumption you may not be able to find which lightly modified electric car it's hiding in.

Nobody knows what any of the risks or mitigations will be, because we haven't done any of it before. All we do know is that optimising systems are effective at manipulating humans, that they can be capable enough to find ways to beat all humans in toy environments like chess, poker, and Diplomacy (the game), and that humans are already using AI (GOFAI, LLMs, SD) without checking the output even when advised that the models aren't very good.

>>ben_w+lb3
The benefit of cocaine and cigarettes is letting people pass the bar exam.

An AI would provide benefits when it is, say, actually making paperclips. An AI that is killing people instead of making paperclips is a liability. A company that is selling shredded fingers in their paperclips is not long for this world. Even asbestos only gives a few people cancer slowly, and it does that while still remaining fireproof.

>Even if it is only trying to kill us all and not provide any benefits — let's say it's been made by a literal death cult like Jonestown or Aum Shinrikyo — what's the smallest such AI that can do it, what's the hardware that needs, what's the energy cost? If it's an H100, that's priced in the realm of a cult, and sufficiently low power consumption you may not be able to find which lightly modified electric car it's hiding in.

Anyone tracking the AI would be looking at where all the suspicious HTTP requests are coming from. But a rogue AI hiding in a car already has very limited capabilities to harm.

>>c_cran+ed3
> The benefit of cocaine and cigarettes is letting people pass the bar exam.

…

how many drugs are you on right now? Even if you think you needed them to pass the bar exam, that's a really weird example to use given GPT-4 does well on that specific test.

One is a deadly cancer stick and not even the best way to get nicotine, the other is a controlled substance that gets life-to-death if you're caught supplying it (possibly unless you're a doctor, but surprisingly hard to google).

> An AI would provide benefits when it is, say, actually making paperclips.

Step 1. make paperclip factory.

Step 2. make robots that work in factory.

Step 3. efficiently grow to dominate global supply of paperclips.

Step 4. notice demand for paperclips is going down, advertise better.

Step 5. notice risk of HAEMP damaging factories and lowering demand for paperclips, use advertising power to put factory with robots on the moon.

Step 6. notice a technicality, exploit technicality to achieve goals better; exactly what depends on the details of the goal the AI is given and how good we are with alignment by that point, so the rest is necessarily a story rather than an attempt at realism.

(This happens by default everywhere: in AI it's literally the alignment problem, either inner alignment, outer alignment, or mesa alignment; in humans it's "work to rule" and Goodhart's Law, and humans do that despite having "common sense" and "not being a sociopath" helping keep us all on the same page).

Step 7. moon robots do their own thing, which we technically did tell them to do, but wasn't what we meant.

We say things like "looks like these AI don't have any common sense" and other things to feel good about ourselves.

Step 8. Sales up as entire surface of Earth buried under a 43 km deep layer of moon paperclips.

> Anyone tracking the AI would be looking at where all the suspicious HTTP requests are coming from.

A VPN, obviously.

But also, in context, how does the AI look different from any random criminal? Except probably more competent. Lot of those around, and organised criminal enterprises can get pretty big even when it's just humans doing it.

Also pretty bad even in the cases where it's a less-than-human-generality CrimeAI that criminal gangs use in a way that gives no agency at all to the AI, and even if you can track them all and shut them down really fast — just from the capabilities gained from putting face tracking AI and a single grenade into a standard drone, both of which have already been demonstrated.

> But a rogue AI hiding in a car already has very limited capabilities to harm.

Except by placing orders for parts or custom genomes, or stirring up A/B tested public outrage, or hacking, or scamming or blackmailing with deepfakes or actual webcam footage, or developing strategies, or indoctrination of new cult members, or all the other bajillion things that (("humans can do" AND "moneys can't do") specifically because "humans are smarter than monkeys").

>>ben_w+6n3
>One is a deadly cancer stick and not even the best way to get nicotine, the other is a controlled substance that gets life-to-death if you're caught supplying it (possibly unless you're a doctor, but surprisingly hard to google).

Regardless of these downsides, people use them frequently in the high stress environments of the bar or med school to deal with said stress. This may not be ideal, but this is how it is.

>Step 3. efficiently grow to dominate global supply of paperclips. >Step 4. notice demand for paperclips is going down, advertise better. >Step 5. notice risk of HAEMP damaging factories and lowering demand for paperclips, use advertising power to put factory with robots on the moon.

When you talk about using 'advertising power' to put paperclip factories on the moon, you've jumped into the realm of very silly fantasy.

>Except by placing orders for parts or custom genomes, or stirring up A/B tested public outrage, or hacking, or scamming or blackmailing with deepfakes or actual webcam footage, or developing strategies, or indoctrination of new cult members, or all the other bajillion things that (("humans can do" AND "moneys can't do") specifically because "humans are smarter than monkeys").

Law enforcement agencies have pretty sophisticated means of bypassing VPNs that they would use against an AI that was actually dangerous. If it was just sending out phishing emails and running scams, it would be one more thing to add to the pile.

zlacker