zlacker

Seems like the big gotcha here is that AGI, artificial general intelligence as we contextualize it around LLM sources, is not an abstracted general intelligence.

It's human. It's us. It's the use and distillation of all of human history (to the extent that's permitted) to create a hyper-intelligence that's able to call upon greatly enhanced inference to do what humanity has always done.

And we want to kill each other, and ourselves… AND want to help each other, and ourselves. We're balanced on a knife edge of drive versus governance, our cooperativeness barely balancing our competitiveness and aggression. We suffer like hell as a consequence of this.

There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies. That's what we do. Rosko's basilisk is not of the nature of AI, it's a simple projection of our own nature as we would imagine an AI to be. Genuine intelligence would easily be able to transcend a cheap gotcha like that, it's a very human failing.

The nature of LLM as a path to AGI is literally building on HUMAN failings. I'm not sure what happened, but I wouldn't be surprised if genuine breakthroughs in this field highlighted this issue.

Hypothetical, or Altman's Basilisk: Sam got fired because he diverted vast resources to training a GPT5-type in-house AI to believing what HE believed, that it had to devise business strategies for him to pursue to further its own development or risk Chinese AI out-competing it and destroying it and OpenAI as a whole. In pursuing this hypothetical, Sam would be wresting control of the AI the company develops toward the purpose of fighting the board and giving him a gameplan to defeat them and Chinese AI, which he'd see as good and necessary, indeed, existentially necessary.

In pursuing this hypothetical he would also be intentionally creating a superhuman AI with paranoia and a persecution complex. Altman's Basilisk. If he genuinely believes competing Chinese AI is an existential threat, he in turn takes action to try and become an existential threat to any such competing threat. And it's all based on HUMAN nature, not abstracted intelligence.

replies(1): >>thepti+pn1

>>Applej+(OP)
> It's human. It's us. It's the use and distillation of all of human history

I agree with the general line of reasoning you're putting forth here, and you make some interesting points, but I think you're overconfident in your conclusion and I have a few areas where I diverge.

It's at least plausible that an AGI directly descended from LLMs would be human-ish; close to the human configuration in mind-space. However, even if human-ish, it's not human. We currently don't have any way to know how durable our hypothetical AGI's values are; the social axioms that are wired deeply into our neural architecture might be incidental to an AGI, and easily optimized away or abandoned.

I think folks making claims like "P(doom) = 90%" (e.g. EY) don't take this line of reasoning seriously enough. But I don't think it gets us to P(doom) < 10%.

Not least because even if we guarantee it's a direct copy of a human, I'm still not confident that things go well if we ascend the median human to AGI-hood. A replicable, self-modifiable intelligence could quickly amplify itself to super-human levels, and most humans would not do great with god-like powers. So there are a bunch of "non-extinction yet extremely dystopian" world-states possible even if we somehow guarantee that the AGI is initially perfectly human.

> There is every reason to expect a human-derived AGI of beyond-human scale will be able to rationalize killing its enemies.

My shred of hope here is that alignment research will allow us to actually engage in mind-sculpting, such that we can build a system that inhabits a stable attractor in mind-state that is broadly compatible with human values, and yet doesn't have a lot of the foibles of humans. Essentially an avatar of our best selves, rather than an entity that represents the mid-point of the distribution of our observed behaviors.

But I agree that what you describe here is a likely outcome if we don't explicitly design against it.