LLMs cannot find reasoning errors, but can correct them

>>koie+(OP)
I wonder if separate LLMs can find each other’s logical mistakes. If I ask llama to find the logical mistake in Yi output, would that work better than llama finding a mistake in llama output?

A logical mistake might imply a blind spot inherent to the model, a blind spot that might not be present in all models.

>>valine+ke
wouldn't this effectively be using a "model" twice the size?

Would it be better to just double the size of one of the models rather than house both?

Genuine question

>>EricMa+gk
Maybe. Goliath 120B took two different llama variants and interwove the layers. Surprisingly Goliath 120B quantized to 2bit is outperforming llama 70B 4bit in many benchmarks.

https://www.reddit.com/r/LocalLLaMA/comments/17vcr9d/llm_com...

>>valine+0m
Do you happen to have a link to where that interwoven layers bit is described? As far as I can tell it's not clear on the model cards.

>>ghotli+oV
The model page is the only info I’ve found on it. As far as I can tell there’s no paper published on the technique.

In the “Merge Process” section they at least give the layer ranges.

https://huggingface.co/alpindale/goliath-120b

>>valine+Lb1
Ah, actually reviewing that more closely I found a link to it in the acknowledgements.

https://github.com/cg123/mergekit

zlacker