zlacker

Maybe. Goliath 120B took two different llama variants and interwove the layers. Surprisingly Goliath 120B quantized to 2bit is outperforming llama 70B 4bit in many benchmarks.

https://www.reddit.com/r/LocalLLaMA/comments/17vcr9d/llm_com...

replies(1): >>ghotli+oz

>>valine+(OP)
Do you happen to have a link to where that interwoven layers bit is described? As far as I can tell it's not clear on the model cards.

replies(1): >>valine+LP

>>ghotli+oz
The model page is the only info I’ve found on it. As far as I can tell there’s no paper published on the technique.

In the “Merge Process” section they at least give the layer ranges.

https://huggingface.co/alpindale/goliath-120b

replies(1): >>ghotli+RX

>>valine+LP
Ah, actually reviewing that more closely I found a link to it in the acknowledgements.

https://github.com/cg123/mergekit