Introducing Superalignment

>>bluefi+v8
>>36604019

>>Chicag+m9
The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously. People often have an initial knee-jerk negative reaction to it (for not crazy reasons, lots of stuff is often overhyped), but that doesn't make it wrong.

It's uncool to look like an alarmist nut, but sometimes there's no socially acceptable alarm and the risks are real: https://intelligence.org/2017/10/13/fire-alarm/

It's worth looking at the underlying arguments earnestly, you can with an initial skepticism but I was persuaded. Alignment is also been something MIRI and others have been worried about since as early as 2007 (maybe earlier?) so it's also a case of a called shot, not a recent reaction to hype/new LLM capability.

Others have also changed their mind when they looked, for example:

- https://twitter.com/repligate/status/1676507258954416128?s=2...

- Longer form: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...

For a longer podcast introduction to the ideas: https://www.samharris.org/podcasts/making-sense-episodes/116...

>>Jimthe+ca
It might not necessarily have a bad effect. It would create new capabilities and those will be followed by new products. AI is amazing at demand induction.

https://en.wikipedia.org/wiki/Induced_demand

>>tim_sw+(OP)
I don't understand how people can still pretend to ignore this: https://plato.stanford.edu/entries/arrows-theorem/

There's also a whole map-territory problem where we're still pretending the distinction hasn't collapsed, Baudrillard-style. As if we weren't all obsessed with "prompt engineering" (whereby the machine trains us).

>>ilaksh+nk
The recent paper about using gpt-4 to give more insight into its actual internals was interesting, but yeah the risks seem really high at the moment that we'd accidentally develop unaligned AGI before figuring out alignment.

Out of the options to reduce that risk I think it would really take something like this, which also seems extremely unlikely to actually happen given the coordination problem: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-no...

You talk about aligned agents - but there aren't any today and we don't know how to make them. It wouldn't be aligned agents vs. unaligned, it's only unaligned.

I don't think spreading out the tech reduces the risk. Spreading out nuclear weapons doesn't reduce the risk (and with nukes at least it's a lot easier to control the fissionable materials). Even with nukes you can still create them and decide not to use them, not so true with superintelligent AGI.

If anyone could have made nukes from their computer humanity may not have made it.

I'm glad OpenAI understands the severity of the problem though and is at least trying to solve it in time.

>>tim_sw+(OP)
Allocating 20% to safety would not be enough if safety and capability aren't aligned. I.e. without saying Bostrom's orthogonality thesis is mostly wrong. However, I believe they may be sufficiently aligned in the long term for 20% to work [1]. The biggest threat imo is that more resources are devoted to AIs with military or monetary-based objectives that are focused on shorter-term capability and power. In this case, capability and safety are not aligned and we race to the bottom. Hopefully global coordination and this effort to achieve superalignment in four years will avoid that.

[1] https://drive.google.com/file/d/1rdG5QCTqSXNaJZrYMxO9x2ChsPB...

>>Animat+es
That's an accurate assessment of the situation, according to every AI alignment researcher I've seen talk about it, including the relatively optimistic ones. This includes people who are mainly focused on AI capabilities but have real knowledge of alignment.

This part in particular caught my eye: "Other assumptions could also break down in the future, like favorable generalization properties during deployment". There have been actual experiments in which AIs appeared to successfully learn their objective in training, and then did something unexpected when released into a broader environment.[1]

I've seen some leading AI researchers dismiss alignment concerns, but without actually engaging with the arguments at all. I've seen no serious rebuttals that actually address the things the alignment people are concerned about.

[1] https://www.youtube.com/watch?v=zkbPdEHEyEI

>>chaxor+ip
To save others the trouble, I googled Voyager, it's pretty interesting. I had no idea an LLM could do this sort of thing:

https://voyager.minedojo.org/

>>Dennis+DM
"What if" is all these "existential risk" conversations ever are.

Where is your evidence that we're approaching human level AGI, let alone SuperIntelligence? Because ChatGPT can (sometimes) approximate sophisticated conversation and deep knowledge?

How about some evidence that ChatGPT isn't even close? Just clone and run OpenAI's own evals repo https://github.com/openai/evals on the GPT-4 API.

It performs terribly on novel logic puzzles and exercises that a clever child could learn to do in an afternoon (there are some good chess evals, and I submitted one asking it to simulate a Forth machine).

>>Dennis+vP
Other examples(in the real world) you might find interesting.

https://tidybot.cs.princeton.edu/ https://innermonologue.github.io/

https://palm-e.github.io/

https://www.microsoft.com/en-us/research/group/autonomous-sy...

>>famous+wW
> https://palm-e.github.io/

The alignment problem will come up when the robot control system notices that the guy with the stick is interfering with the robot's goals.

>>arisAl+HV
All of this discussion really makes me think of Robert Miles "Is ai safety a Pascal's mugging?" from 4 years(!) ago[0]. All of this discussion has been had by Ai safety researchers for years in my layman understanding... Maybe we can look to them for insight in to these questions?

[0] https://youtu.be/JRuNA2eK7w0

>>crop_r+dm
Meanwhile, GPT-4 still can’t reliably multiply small numbers.

https://arxiv.org/abs/2304.02015

>>miohta+p41
Sam doesn't have much financial upside from OpenAI (reportedly, he doesn't have any equity).

And he wrote about the risk in 2015 months before OpenAI was founded: https://blog.samaltman.com/machine-intelligence-part-1 https://blog.samaltman.com/machine-intelligence-part-2

Fine if you disagree with his arguments, but why assume you know what his motivation is?

>>Dennis+F31
https://openai.com/research/language-models-can-explain-neur...

>>skepti+pe
It's the other way round: Just accusing people of being in a cult is unscientific. There are plenty of arguments that AI x-risk is real.

E.g. by Yoshua Bengio: https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-r...

>>woadwa+m71
It's alright with algorithmic prompts - https://arxiv.org/abs/2211.09066

also it knows when to use a calculator if it has access to one so it's not a big deal

>>c_cran+d01
LLMs are fairly capable physical agents already. Nothing large about the assumption at all. Not that a robotic threat is even necessary.

https://tidybot.cs.princeton.edu/

https://innermonologue.github.io/

https://palm-e.github.io/

https://www.microsoft.com/en-us/research/group/autonomous-sy...

>>mptest+P01
At this point, with so many of them disagreeing and with so many varying details, one will choose the expert insight which most closely matches their current beliefs.

I hadn’t encountered Pascal’s mugging (https://en.wikipedia.org/wiki/Pascal%27s_mugging) before and the premise is indeed pretty apt. I think I’m on the side that it’s not, assuming the idea is that it’s a Very Low Chance of a Very Bad Thing -- the “muggee” wants to give their wallet on the chance of the VBT because of the magnitude of its effect. It seems like there’s a rather high chance if (proverbially) the AI-cat is let out of the bag.

But maybe some Mass Effect nonsense will happen if we develop AGI and we’ll be approached by The Intergalactic Community and have our technology advanced millennia overnight. (Sorry, that’s tongue-in-cheek but it does kinda read like Pascal’s mugging in the opposite direction; however, that’s not really what most researchers are arguing.)

>>ben_w+yq1
But you're also autocomplete (prediction engine) on steroids.

https://www.psy.ox.ac.uk/news/the-brain-is-a-prediction-mach...

>>arisAl+HV
Only two of those things are true, and the first led you to the fallacy of expecting trends to continue unabated. As I stated in a previous comment when this topic came up, airplanes had exponential growth in speed from their inception at 44 mph to 2193 mph just 79 years later. If these trends continue, the top speed of an airplane will be set this year at Mach 43. (Yes, I actually fit the curve.)[0]

How do you stop a crazy AI? You turn it off.

Pout pleas. Keep it preying about fantasy bogeyman instead of actual harms today, and never EVER question why.

[0] >>36038681

>>tim_sw+(OP)
Worth pointing out, imo, that if you think this problem is not real then you are asserting you understand AI better than Geoffrey Hinton, Yoshua Bengio and Ilya Sutskever.

If you don't know who they are, then well I guess that makes sense.

If you do know who they are and your confidence is wavering then [0] is a great place to get started understanding the alignment problem.

OAI is a great place to work and the team is hiring for engineers and scientists.

[0] https://80000hours.org/problem-profiles/artificial-intellige...

>>junon+hs1
Here's an article[0] and a good short story[1] explaining exactly this.

[0]: No Physical Substrate, No Problem https://slatestarcodex.com/2015/04/07/no-physical-substrate-...

[1]: It Looks Like You're Trying To Take Over The World https://gwern.net/fiction/clippy

>>nights+4R1
They're aligned with the military-industrial complex. The US military is one of the biggest consumers of fossil fuels[1] and it's the same with other nations and their energy use. So profitable is not the same as aligned with human values.

1: https://en.m.wikipedia.org/wiki/Energy_usage_of_the_United_S...

>>flagra+iZ1
There are no enemies. The biosphere is a singular organism and right now people are doing their best to basically destroy all of it. The only way to prevent further damage is to reduce the human population but that's another non-starter so as long as the human population is increasing it will compel the people in charge to continue pushing for more technological "innovation" because technology is the best way to control 8B+ people[1].

Very few people are actually alarmed about the right issues (in no particular order): population size, industrial pollution, military-industrial complex, for-profit multi-national corporations, digital surveillance, factory farming, global warming, &etc. This is why the alarmism from the AI crowd seems disingenuous because AI progress is simply an extension of for-profit corporatism and exploitation applied to digital resources and to properly address the risk from AI would require addressing the actual root causes of why technological progress is misaligned with human values.

1: https://www.theguardian.com/world/2015/jul/24/france-big-bro...

>>junon+hs1
Did you ever play the old "Pandemic" flash game? https://tvtropes.org/pmwiki/pmwiki.php/VideoGame/Pandemic

That the origin of COVID is even a question implies we have the tech to do it artificially. An AI today treating real life as that game would be self-destructive, but that doesn't mean it won't happen (reference classes: insanity, cancer).

If the AI can invent and order a von Neumann probe — the first part is the hard part, custom parts orders over the internet is already a thing — that it can upload itself to, then it can block out (and start disassembling) the sun in a matter of decades with reasonable-looking reproduction rates (though obviously we're guessing what "reasonable" looks like as we have only organic VN machines to frame the question against).

AI taking over brain implants and turning against everyone without them like a zombie war (potentially Neuralink depending on how secure the software is, and also a plot device in web fiction serial The Deathworlders, futuristic sci-fi and you may not be OK with sci-fi as a way to explore hypotheticals, but I think it's the only way until we get moon-sized telescopes to watch such things play out on other worlds without going there; in that story the same AI genocides multiple species over millions of years as an excuse for why humans can even take part in the events of the story).

>>wickof+vw1
"It's one of those irregular verbs, isn't it? I'm good at improv and speaking on my feet, you finish each other's sentences, they're just autocomplete on steroids."

https://en.wikiquote.org/wiki/Yes,_Minister

>>lukesc+1R2
It’s worth reading about the orthogonality thesis and the underlying arguments about it.

It’s not an anti-goal that’s intentionally set, it’s that complex goal setting is hard and you may end up with something dumb that maximizes the reward unintentionally.

The issue is all of the AGIs will be unaligned in different ways because we don’t know how to align any of them. Also, the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

There’s a lot of thoughtful writing that exists on this topic and it’s really worth digging into the state of the art about it, your replies are thoughtful so it sounds like something you’d think about. I did the same thing a few years ago (around 2015) and found the arguments persuasive.

This is a decent overview: https://www.samharris.org/podcasts/making-sense-episodes/116...

>>janals+Cu2
At the limit sure there’s variance, but our shared selected history has a lot in common, something a non-human intelligence would not get for free: https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden...

I’m also not a moral relativist, I don’t think all values are equivalent, but you don’t even need to go there - before that point a lot of what humans want is not controversial and the “obvious” cases are not so obvious or easy to classify.

>>distor+vK1
Bee extinction wasn't addressed, it was just revealed to not be true. Article with lots of data here:

https://www.acsh.org/news/2018/04/17/bee-apocalypse-was-neve...

Mass starvation wasn't "addressed" exactly, because the predictions were for mass starvation in the west, which never happened. Also the people who predicted this weren't the ones who created the Green Revolution.

Ozone hole is I think the most valid example in the list, but who knows, maybe that was just BS too. A lot of scientific claims turn out to be so, these days, even those that were accepted for quite a while.

>>goneho+vQ2
Seeking profit and constant population growth are already extremely dumb goals on their own. You can continue worrying about AGI if you want but nothing I've said is either cynical or anti-human. It is simply a description of the global techno-industrial economic system and its total blindness to all the negative externalities of cancerous growth. Continued progress and development of AI capabilities does not change the dynamics of the machine that is destroying the biosphere and it never will because it is an extension of profit seeking exploitative corporate practices carried over to the digital sphere. To address the root causes of misalignment will require getting rid of profit motives and accounting for all the metabolic byproducts of human economic activity and consumption. Unless the AI alarmists have a solution to those things they're just creating another distraction and diverting attention away from the actual problems[1].

1: https://www.nationalgeographic.com/environment/article/plast...

>>c_cran+y83
Cunning absolutely should count as an aspect of intelligence.

If this is just a definitions issue, s/artificial intelligence/artificial cunning/g to the same effect.

Strength seems somewhat irrelevant either way, given the existence of Windows for Warships[0].

[0] not the real name: https://en.wikipedia.org/wiki/Submarine_Command_System

>>djur+Wv1
Because the humans overestimate the upside, underestimate the downside, and are often too lazy to check the output.

There's power and prestige in money, too, not just the positions.

Hence the lawyers who got in trouble for outsourcing themselves to ChatGPT: https://www.reuters.com/legal/new-york-lawyers-sanctioned-us...

Or those t-shirts from a decade back: https://money.cnn.com/2013/06/24/smallbusiness/tshirt-busine...

>>thunks+S02
There is. AI progress in the last 10 years was massive, and there is no end in sight. The risks are well established in countless essays. See e.g. https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-r...

>>flagra+TI5
The current model is already destructive and most of the market is managed by artificial agents. Schwab will give you a roboadvisor to manage your retirement account so AI is already managing large chunks of the financial markets. Letting AI manage not just the financial aspects but things like farmland is an obvious extension of the same principle and since AIs can notice more patterns it's going to become basically a necessity because global warming is going to make large parts of existing farmlands unmanageable. Floods and droughts are becoming more common and humans are very bad at figuring out the weather so there will be an AI agent monitoring weather patterns and allocating seeds to various plots of land to maximize yields.

Bill Gates has bought up a bunch of farmland and I am certain he will use AI to manage them because manual allocation will be too inefficient[1].

1: https://www.popularmechanics.com/science/environment/a425435...

zlacker

Introducing Superalignment