Although I'm optimistic about this, a line popped out in the blog post stuck out to me: "deliberately training misaligned models".
I'm guessing they mean misaligned small or weak models, and misaligned in a non-dangerous way - but the idea of training models whose goal is adversarial brings to mind the idea of gain-of-function research.
>>vimota+(OP)
I just figured they meant models that would get them in trouble socially. Peeling away the hype bubble around end of days stuff, all I see is they feel like they can’t scale until they figure out how to avoid bad headlines. They know their current kneecapping of ChatGPT has limited the usefulness too much. They know their trained speech patterns are getting too obvious. They want to break through this barrier so they can scale up, where companies can trust their model to NEVER generate undesired speech, and are more willing to wire their products up directly to GPT than the current small potatoes.