zlacker

[return to "We have reached an agreement in principle for Sam to return to OpenAI as CEO"]
1. shubha+B7[view] [source] 2023-11-22 06:50:16
>>staran+(OP)
At the end of the day, we still don't know what exactly happened and probably, never will. However, it seems clear there was a rift between Rapid Commercialization (Team Sam) and Upholding the Original Principles (Team Helen/Ilya). I think the tensions were brewing for quite a while, as it's evident from an article written even before GPT-3 [1].

> Over time, it has allowed a fierce competitiveness and mounting pressure for ever more funding to erode its founding ideals of transparency, openness, and collaboration

Team Helen acted in panic, but they believed they would win since they were upholding the principles the org was founded on. But they never had a chance. I think only a minority of the general public truly cares about AI Safety, the rest are happy seeing ChatGPT helping with their homework. I know it's easy to ridicule the sheer stupidity the board acted with (and justifiably so), but take a moment to think of the other side. If you truly believed that Superhuman AI was near, and it could act with malice, won't you try to slow things down a bit?

Honestly, I myself can't take the threat seriously. But, I do want to understand it more deeply than before. Maybe, it isn't without substance as I thought it to be. Hopefully, there won't be a day when Team Helen gets to say, "This is exactly what we wanted to prevent."

[1]: https://www.technologyreview.com/2020/02/17/844721/ai-openai...

◧◩
2. pug_mo+Cb[view] [source] 2023-11-22 07:15:57
>>shubha+B7
I'm convinced there is a certain class of people who gravitate to positions of power, like "moderators", (partisan) journalists, etc. Now, the ultimate moderator role has now been created, more powerful than moderating 1000 subreddits - the AI safety job who will control what AI "thinks"/says for "safety" reasons.

Pretty soon AI will be an expert at subtly steering you toward thinking/voting for whatever the "safety" experts want.

It's probably convenient for them to have everyone focused on the fear of evil Skynet wiping out humanity, while everyone is distracted from the more likely scenario of people with an agenda controlling the advice given to you by your super intelligent assistant.

Because of X, we need to invade this country. Because of Y, we need to pass all these terrible laws limiting freedom. Because of Z, we need to make sure AI is "safe".

For this reason, I view "safe" AIs as more dangerous than "unsafe" ones.

◧◩◪
3. simonh+En[view] [source] 2023-11-22 08:47:34
>>pug_mo+Cb
A main concern in AI safety is alignment. Ensuring that when you use the AI to try to achieve a goal that it will actually act towards that goal in ways you would want, and not in ways you would not want.

So for example if you asked Sydney, the early version of the Bing LLM, some fact it might get it wrong. It was trained to report facts that users would confirm as true. If you challenged it’s accuracy what do you want to happen? Presumably you’d want it to check the fact or consider your challenge. What it actually did was try to manipulate, threaten, browbeat, entice, gaslight, etc, and generally intellectually and emotionally abuse the user into accepting its answer, so that it’s reported ‘accuracy’ rate goes up. That’s what misaligned AI looks like.

◧◩◪◨
4. gorbyp+PA[view] [source] 2023-11-22 10:44:04
>>simonh+En
I haven't been following this stuff too closely, but have there been any more findings on what "went wrong" with Sydney initially? Like, I thought it was just a wrapper on GPT (was it 3.5?), but maybe Microsoft took the "raw" GPT weights and did their own alignment? Or why did Sydney seem so creepy sometimes compared to ChatGPT?
◧◩◪◨⬒
5. simonh+PF7[view] [source] 2023-11-24 15:10:13
>>gorbyp+PA
I think what happened is Microsoft got the raw GPT3.5 weights, based on the training set. However for ChatGPT OpenAI had done a lot of additional training to create the 'assistant' personality, using a combination of human and model based response evaluation training.

Microsoft wanted to catch up quickly so instead of training the LLM itself, they relied on prompt engineering. This involved pre-loading each session with a few dozen rules about it's behaviour as 'secret' prefaces to the user prompt text. We know this because some users managed to get it to tell them the prompt text.

[go to top]