zlacker

[parent] [thread] 8 comments
1. Athari+(OP)[view] [source] 2023-11-20 09:40:57
I don't consider Anthropic's approach to safety fantastic. They train the model to lie, play cat and mouse with jailbreakers, run moderation on generations with delay etc. This makes the model appear safer, as it's harder to jailbreak, but this approach solves nothing fundamentally.

If Ilya is concerned about safety and alignment, he probably has a better chance to get there with OpenAI, now the he has more control over it.

replies(2): >>didntc+ua >>dalore+Hh
2. didntc+ua[view] [source] 2023-11-20 10:52:09
>>Athari+(OP)
I haven't paid a lot of attention to Anthropic. Are you able to summarize, or link anything about, those events for those who missed it? Particularly the "training to lie" bit
replies(1): >>Athari+Xo
3. dalore+Hh[view] [source] 2023-11-20 11:39:28
>>Athari+(OP)
Anthropic safety is overboard. I tried the classic question of "how many holes does a straw have?" And it refused to talk about the topic. I'm assuming because it thought holes was sexual.
replies(3): >>JBiser+bk >>visarg+Oo >>PH95Vu+c61
◧◩
4. JBiser+bk[view] [source] [discussion] 2023-11-20 11:55:27
>>dalore+Hh
Given what AIs "know" about humanity, I think it's safe to assume that they "think" every word is sexual. For example straw could be short for strawman, which is a man, which is sexual. Or it can be innuendo for... you know.

As for your actual question, it seems to me that a straw is topologically equivalent to a torus, so it has 1 hole, right?

replies(1): >>TeMPOr+mu
◧◩
5. visarg+Oo[view] [source] [discussion] 2023-11-20 12:29:53
>>dalore+Hh
When did you last try that? I checked right now and it says

> A straw has one hole that runs through its entire length.

replies(1): >>dalore+yw4
◧◩
6. Athari+Xo[view] [source] [discussion] 2023-11-20 12:30:47
>>didntc+ua
David Shapiro complained about Anthropic's approach to alignment. In his video https://www.youtube.com/watch?v=PgwpqjiKkoY he discusses ableism, moralism, lying.

As to cat-and-mouse with jailbreakers, I don't remember any thorough articles or videos. It's mostly based on discussions on LLM forums. Claude is widely regarded as one of the best models for NSFW roleplay, which completely invalidates Antropic's claims about safety and alignment being "solved."

◧◩◪
7. TeMPOr+mu[view] [source] [discussion] 2023-11-20 13:03:43
>>JBiser+bk
> it seems to me that a straw is topologically equivalent to a torus, so it has 1 hole, right?

For a mathematician, yes. For everyone else, it obviously has two, because when you plug one end, only then it has one.

◧◩
8. PH95Vu+c61[view] [source] [discussion] 2023-11-20 15:55:36
>>dalore+Hh
that sentence makes no sense to me, what is a straw here?
◧◩◪
9. dalore+yw4[view] [source] [discussion] 2023-11-21 12:44:19
>>visarg+Oo
Now follow up with: how many holes do trousers have?
[go to top]