zlacker

[parent] [thread] 1 comments
1. didntc+(OP)[view] [source] 2023-11-20 10:52:09
I haven't paid a lot of attention to Anthropic. Are you able to summarize, or link anything about, those events for those who missed it? Particularly the "training to lie" bit
replies(1): >>Athari+te
2. Athari+te[view] [source] 2023-11-20 12:30:47
>>didntc+(OP)
David Shapiro complained about Anthropic's approach to alignment. In his video https://www.youtube.com/watch?v=PgwpqjiKkoY he discusses ableism, moralism, lying.

As to cat-and-mouse with jailbreakers, I don't remember any thorough articles or videos. It's mostly based on discussions on LLM forums. Claude is widely regarded as one of the best models for NSFW roleplay, which completely invalidates Antropic's claims about safety and alignment being "solved."

[go to top]