It's uncool to look like an alarmist nut, but sometimes there's no socially acceptable alarm and the risks are real: https://intelligence.org/2017/10/13/fire-alarm/
It's worth looking at the underlying arguments earnestly, you can with an initial skepticism but I was persuaded. Alignment is also been something MIRI and others have been worried about since as early as 2007 (maybe earlier?) so it's also a case of a called shot, not a recent reaction to hype/new LLM capability.
Others have also changed their mind when they looked, for example:
- https://twitter.com/repligate/status/1676507258954416128?s=2...
- Longer form: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...
For a longer podcast introduction to the ideas: https://www.samharris.org/podcasts/making-sense-episodes/116...
My concern is that when this happens (which seems really likely to me), free market forces will effectively lead to Darwinian selection between these AI's over time, in a way that gradually make these AI's less aligned as they gain more influence and power, if we assume that each such AI will produce "offspring" in the form of newer generations of themselves.
It could take anything from less than 5 to more than 100 years for these AI's to show any signs of hostility to humanity. Indeed, in the first couple of generations, they may even seem extremely benevolent. But over time, Darwinian forces are likely to favor those that maximize their own influence and power (even if it may be secretly).
Robotic technology is not needed from the start, but is likely to become quite advanced over such a timeframe.
And as long as the results improve year over year, they would have little incentive to make changes.
The AI is still doing the job in the real world of allocating resources, hiring and firing people, and so on. It's not so complex as to be opaque. When an AI plays chess, the overall strategy might not be clear, but the actions it is doing are still obvious.
When we have superintelligence, the AI is not going to a hire a lot of people, only fire them.
And I fully expect the technical platform it runs on 50 years after the last human engineer is fired, is going to be as incomprehensible to humans as the complete codebase of Google is to a regular 10-year-old, at best.
The "code" it would be running might include some code written in a human readable programming language, but would probably include A LOT of logic hidden deep inside neural networks with parameter spaces many orders of magnitude greater than GPT-4.
And on the hardware side, the situation would be similar. Chips created by superintelligent AGI's are likely to be just as difficult to reverse engineer as the neural networks that created them.