zlacker

[parent] [thread] 6 comments
1. nickth+(OP)[view] [source] 2023-12-20 21:03:55
I am very open to believing that. I'd love to see some examples.
replies(4): >>GaggiX+s1 >>turnso+M1 >>shiftp+T4 >>buggle+J6
2. GaggiX+s1[view] [source] 2023-12-20 21:11:03
>>nickth+(OP)
Well it's pretty easy to find examples online, this one using Llama 2, not even Mistral or fancy techniques: https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...
3. turnso+M1[view] [source] 2023-12-20 21:13:10
>>nickth+(OP)
I agree, I think they need an example or two on that blog post to back up the claim. I'm ready to believe it, but I need something more than "diverse customer tasks" to understand what we're talking about.
4. shiftp+T4[view] [source] 2023-12-20 21:31:37
>>nickth+(OP)
They're quite close in arena format: https://chat.lmsys.org/?arena
replies(1): >>TOMDM+17
5. buggle+J6[view] [source] 2023-12-20 21:43:55
>>nickth+(OP)
You can fine-tune a small model yourself and see. GPT-4 is an amazing general model, but won’t perform the best at every task you throw at it, out of the box. I have a fine-tuned Mistral 7B model that outperforms GPT 4 on a specific type of structured data extraction. Maybe if I fine-tuned GPT-4 it could beat it, but that costs a lot of money for what I can now do locally for the cost of electricity.
◧◩
6. TOMDM+17[view] [source] [discussion] 2023-12-20 21:45:53
>>shiftp+T4
To be clear, Mixtral is very competitive, Mistral while certainly way better than most 7B models performs far worse than ChatGPT3.5 Turbo.
replies(1): >>shiftp+Ac
◧◩◪
7. shiftp+Ac[view] [source] [discussion] 2023-12-20 22:18:30
>>TOMDM+17
Apologies, that's what I get for skimming through the thread.
[go to top]