zlacker

It’s worth reading about the orthogonality thesis and the underlying arguments about it.

It’s not an anti-goal that’s intentionally set, it’s that complex goal setting is hard and you may end up with something dumb that maximizes the reward unintentionally.

The issue is all of the AGIs will be unaligned in different ways because we don’t know how to align any of them. Also, the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

There’s a lot of thoughtful writing that exists on this topic and it’s really worth digging into the state of the art about it, your replies are thoughtful so it sounds like something you’d think about. I did the same thing a few years ago (around 2015) and found the arguments persuasive.

This is a decent overview: https://www.samharris.org/podcasts/making-sense-episodes/116...

replies(1): >>ben_w+uM

>>goneho+(OP)
> the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

Thanks for reminding me that I need to properly write up why I don't think self-improvement is a huge issue.

(My thought won't fit into a comment, and I'll want to link to it later).