Introducing Superalignment

>>tim_sw+(OP)
From a layman's perspective when it comes to cutting edge AI, I can't help but be a bit turned off by some of the copy. It seems it goes out of its way to use purposefully exhuberant language as a way to make the risks seem even more significant, just so as an offshoot it implies that the technology being worked on is so advanced. I'm trying to understand why it rubs me particularly the wrong way here, when, frankly, it is just about the norm anywhere else? (see tesla with FSD, etc.)

>>Chicag+m9
The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously. People often have an initial knee-jerk negative reaction to it (for not crazy reasons, lots of stuff is often overhyped), but that doesn't make it wrong.

It's uncool to look like an alarmist nut, but sometimes there's no socially acceptable alarm and the risks are real: https://intelligence.org/2017/10/13/fire-alarm/

It's worth looking at the underlying arguments earnestly, you can with an initial skepticism but I was persuaded. Alignment is also been something MIRI and others have been worried about since as early as 2007 (maybe earlier?) so it's also a case of a called shot, not a recent reaction to hype/new LLM capability.

Others have also changed their mind when they looked, for example:

- https://twitter.com/repligate/status/1676507258954416128?s=2...

- Longer form: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...

For a longer podcast introduction to the ideas: https://www.samharris.org/podcasts/making-sense-episodes/116...

>>goneho+gf
The extinction risk relies on a large and nasty assumption, that a super intelligent computer will immediately become a super physically capable agent. Apparently, one has to believe that a superintelligence must then lead to a shower of nanomachines.

>>c_cran+d01
Not at all. My personal assumption is that when superintelligence comes online, several corporations will soon come under control of these superintelligences, with them effectively acting as both CEO's and also filling a lot of other roles at the same time.

My concern is that when this happens (which seems really likely to me), free market forces will effectively lead to Darwinian selection between these AI's over time, in a way that gradually make these AI's less aligned as they gain more influence and power, if we assume that each such AI will produce "offspring" in the form of newer generations of themselves.

It could take anything from less than 5 to more than 100 years for these AI's to show any signs of hostility to humanity. Indeed, in the first couple of generations, they may even seem extremely benevolent. But over time, Darwinian forces are likely to favor those that maximize their own influence and power (even if it may be secretly).

Robotic technology is not needed from the start, but is likely to become quite advanced over such a timeframe.

>>trasht+J61
I imagine some corporations might toy with the idea of letting a LLM or AI manage operations, but this would still be under some person's oversight. AIs don't have the legal means to own property.

>>c_cran+HU2
There would probably be a board. But a company run by a superintelligent AI would quickly become so complex that the inner workings of the company would become a black box to the board.

And as long as the results improve year over year, they would have little incentive to make changes.

>>trasht+ev5
>But a company run by a superintelligent AI would quickly become so complex that the inner workings of the company would become a black box to the board.

The AI is still doing the job in the real world of allocating resources, hiring and firing people, and so on. It's not so complex as to be opaque. When an AI plays chess, the overall strategy might not be clear, but the actions it is doing are still obvious.

>>c_cran+Wm7
> The AI is still doing the job in the real world of allocating resources, hiring and firing people, and so on.

When we have superintelligence, the AI is not going to a hire a lot of people, only fire them.

And I fully expect the technical platform it runs on 50 years after the last human engineer is fired, is going to be as incomprehensible to humans as the complete codebase of Google is to a regular 10-year-old, at best.

The "code" it would be running might include some code written in a human readable programming language, but would probably include A LOT of logic hidden deep inside neural networks with parameter spaces many orders of magnitude greater than GPT-4.

And on the hardware side, the situation would be similar. Chips created by superintelligent AGI's are likely to be just as difficult to reverse engineer as the neural networks that created them.

zlacker