zlacker

[return to "LLMs cannot find reasoning errors, but can correct them"]
1. valine+ke[view] [source] 2023-11-20 20:28:09
>>koie+(OP)
I wonder if separate LLMs can find each other’s logical mistakes. If I ask llama to find the logical mistake in Yi output, would that work better than llama finding a mistake in llama output?

A logical mistake might imply a blind spot inherent to the model, a blind spot that might not be present in all models.

◧◩
2. EricMa+gk[view] [source] 2023-11-20 20:52:20
>>valine+ke
wouldn't this effectively be using a "model" twice the size?

Would it be better to just double the size of one of the models rather than house both?

Genuine question

◧◩◪
3. valine+0m[view] [source] 2023-11-20 20:59:51
>>EricMa+gk
Maybe. Goliath 120B took two different llama variants and interwove the layers. Surprisingly Goliath 120B quantized to 2bit is outperforming llama 70B 4bit in many benchmarks.

https://www.reddit.com/r/LocalLLaMA/comments/17vcr9d/llm_com...

◧◩◪◨
4. ghotli+oV[view] [source] 2023-11-21 00:02:44
>>valine+0m
Do you happen to have a link to where that interwoven layers bit is described? As far as I can tell it's not clear on the model cards.
◧◩◪◨⬒
5. valine+Lb1[view] [source] 2023-11-21 01:55:41
>>ghotli+oV
The model page is the only info I’ve found on it. As far as I can tell there’s no paper published on the technique.

In the “Merge Process” section they at least give the layer ranges.

https://huggingface.co/alpindale/goliath-120b

◧◩◪◨⬒⬓
6. ghotli+Rj1[view] [source] 2023-11-21 02:47:43
>>valine+Lb1
Ah, actually reviewing that more closely I found a link to it in the acknowledgements.

https://github.com/cg123/mergekit

[go to top]