Jan Leike's OpenAI departure statement

>>jnnnth+(OP)
X.com links are currently broken on HN so I posted a screenshot instead

https://x.com/janleike/status/1791498187313963308

>>ldayle+uf
From this interview https://youtu.be/Nlkk3glap_U?feature=shared it seemed Anthropic was focusing more on those topics rather than winning the race. He hinted at being “forced” by the competition to release their models.

>>jnnnth+(OP)
> “Over the past few months […] we were struggling for compute

OpenAI literally said they were setting aside 20% of compute to ensure alignment [1] but if you read the fine print, what they said was that they are “dedicating 20% of the compute we’ve secured ‘to date’ to this effort” (emphasis mine). So if their overall compute has increased by 10x then that 20% is suddenly 2%, right? Is OpenAI going to be responsible or is it just a mad race (modelled from the top) to “win” the AI game?

[1] https://openai.com/index/introducing-superalignment/?utm_sou...

>>skepti+vy
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...

>>reduce+rB
https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf

very well. straight from the horse's mouth:

>When designing the red teaming process for DALL·E 3, we considered a wide range of risks3 such as:

>1. Biological, chemical, and weapon related risks

>2. Mis/disinformation risks

>3. Racy and unsolicited racy imagery

>4. Societal risks related to bias and representation

(4) is DEI bullshit verbatim, (3) is DEI bullshit de facto - we all know which side of the kulturkampf screeches about "racy" things (like images of conventionally attractive women in bikinis) in the current year.

I don't know which exact role did that exact individual play at trust/safety/ethics/fart-fart-blah-fart department over at openai, but it is painfully, very painfully obvious what are openai/microsoft/google/meta/anthropic/stability/etc afraid their models might do. in every fucking press release, they all bend over backwards to appease the kvetchers, who are ever ready, eager and willing to post scalding hot takes all over X (formerly known as twitter).

>>123yaw+nD
Again, the superalignment team that Jan Leike and Ilya was working on, along with Yudkowsky's opinions, are unrelated to any DEI and "racy"-ness.

You can read the Superalignment announcement and what it focuses on. The entire thing is about AGI x-risk, with a small paragraph about how there's other people's work about whatever bias and PC-ness.

These are different concerns by different people. You and many others are pattern matching AGI x-risk to the AI bias people to your detriment and it's poisoning the discourse. Listen to Emmett Shear (former OpenAI/Twitch CEOs) explain this in depth: https://www.youtube.com/watch?v=jZ2xw_1_KHY&t=800s

>>skepti+vy
I find it delightfully ironic that humans are so bad at the things we criticise AI for not being able to do, such as extrapolating to outside our experience.

As a society, we don't even agree on the meanings of each of the initials of "AGI", and many of us use the triplet to mean something (super-intelligence) that isn't even one of those initials; for your claim to be true, AGI has to be a higher standard than "intern of all trades, senior of none" because that's what the LLMs do.

Expert-at-everything-level AGI is dangerous because the definition of the term is that it can necessarily do anything that a human can do[0], and that includes triggering a world war by assassinating an archduke, inventing the atom bomb, and at least four examples (Ireland, India, USSR, Cambodia) of killing several million people by mis-managing a country that they came to rule by political machinations that are just another skill.

When it comes to AI alignment, last I checked we don't know what we even mean by the concept: if you have two AI, there isn't even a metric you can use to say if one is more aligned than the other.

If I gave a medieval monk two lumps of U-238 and two more of U-235, they would not have the means to determine which pair was safe to bash together and which would kill them in a blue flash. That's where we're at with AI right now. And like the monks in this metaphor, we also don't have the faintest idea if the "rocks" we're "bashing together" are "uranium", nor what a "critical mass" is.

Sadly this ignorance isn't a shield, as evolution made us without any intentionality behind it. So we don't know how to recognise "unsafe" when we do it, we don't know if we might do it by accident, we don't know how to do it on purpose in order to say "don't do that", and because of this we may be doing cargo-cult "intelligence" and/or "safety" at any given moment and at any given scale, making us fractally-wrong[1] about basically every aspect including which ones we should even care about.

[0] If you think it needs a body, I'd point out we've already got plenty of robot bodies for it to control, the software for these is the hard bit

[1] https://blog.codinghorror.com/the-php-singularity/

>>Atotal+xh
Claude Opus doesn't appear to have more safety limitations than ChatGPT.

The older Claude 2.1 on the other hand was so ridiculously incapable of functioning due to a safety first design I'm guessing it inspired the goodie 2 parody AI. https://www.goody2.ai/

>>amenho+ei
This feels like the 2022 version of Steve Ballmer's infamous, deeply cringe on-stage freakout at a Microsoft event: https://www.youtube.com/watch?v=_WW2JWIv6G8

zlacker

Jan Leike's OpenAI departure statement