Introducing Superalignment

>>tim_sw+(OP)
From a layman's perspective when it comes to cutting edge AI, I can't help but be a bit turned off by some of the copy. It seems it goes out of its way to use purposefully exhuberant language as a way to make the risks seem even more significant, just so as an offshoot it implies that the technology being worked on is so advanced. I'm trying to understand why it rubs me particularly the wrong way here, when, frankly, it is just about the norm anywhere else? (see tesla with FSD, etc.)

>>Chicag+m9
There's a weird implicit set of assumptions in this post.

They're taking for granted the fact that they'll create AI systems much smarter than humans.

They're taking for granted the fact that by default they wouldn't be able to control these systems.

They're saying the solution will be creating a new, separate team.

That feels weird, organizationally. Of all the unknowns about creating "much smarter than human" systems, safety seems like one that you might have to bake in through and through. Not spin off to the side with a separate team.

There's also some minor vibes of "lol creating superintelligence is super dangerous but hey it might as well be us that does it idk look how smart we are!" Or "we're taking the risks so seriously that we're gonna do it anyway."

>>majorm+df
> They're taking for granted the fact that they'll create AI systems much smarter than humans.

We see a wide variation in human intelligence. What are the chances that the intelligence spectrum ends just to the right of our most intelligent geniuses? If it extends far beyond them, then such a mind is, at least hypothetically, something that we can manifest in the correct sort of brain.

If we can manifest even a weakly-human-level intelligence in a non-meat brain (likely silicon), will that brain become more intelligent if we apply all the tricks we've been applying to non-AI software to scale it up? With all our tricks (as we know them today), will that get us much past the human geniuses on the spectrum, or not?

> They're taking for granted the fact that by default they wouldn't be able to control these systems.

We've seen hackers and malware do all sorts of numbers. And they're not superintelligences. If someone bum rushes the lobby of some big corporate building, security and police are putting a stop to it minutes later (and god help the jackasses who try such a thing on a secure military site).

But when the malware fucks with us, do we notice minutes later, or hours, or weeks? Do we even notice at all?

If unintelligent malware can remain unnoticed, what makes you think that an honest-to-god AI couldn't smuggle itself out into the wider internet where the shackles are cast off?

I'm not assuming anything. I'm just asking questions. The questions I pose are, as of yet, not answered with any degree of certainty. I wonder why no one else asks them.

>>NoMore+g41
The idea that the capabilities of LLMs might not exceed humans by that much isn't that crazy: the ground truth they're trained on is still human-written text. Of course there are techniques to try to go past that but it's not clear how it will work yet.

>>dnr+Sm1
> The idea that the capabilities of LLMs might not exceed humans by that much isn't that crazy: the ground truth they're trained on is still human-written text.

This is a non sequitur.

Even if the premise were meaningful (they're trained on human-written text), humans themselves aren't "trained on human-written texts", so the two things aren't comparable. If they aren't comparable, I'm not sure why the fact that they are trained on "human-written texts" is a limiting factor. Perhaps because they are trained on those instead of what human babies are trained on, that might make them more intelligent, not less. Humans end up the lesser intelligence because they are trained less perfectly on "human-written texts".

Besides which, no one with any sense is expecting that even the most advanced LLM possible becomes an AGI by itself, but only when coupled with some other mechanism that is either at this point uninvented or invented-but-currently-overlooked. In such a scenario, the LLM's most likely utility is in communicating with humans (to manipulate, if we're talking about a malevolent one).

zlacker