zlacker

>>rzk+(OP)
Are language models really the best choice for this?

Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as

> We also found that the models were highly sensitive to seemingly trivial prompt changes

>>Havoc+57
No, LLMs are not a good choice for this – as the results show! If I had to guess, they're experimenting with LLMs for publicity.

>>kqr+wc
Exactly. This is a performance by a really bad method actor.