Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as
> We also found that the models were highly sensitive to seemingly trivial prompt changes