zlacker

[parent] [thread] 5 comments
1. m3kw9+(OP)[view] [source] 2023-12-21 00:17:10
I’m really struggling to find a use case for these local models when even ChatGPT 3.5 can do it as good as any of them so far.
replies(2): >>coder5+b3 >>wongar+g5
2. coder5+b3[view] [source] 2023-12-21 00:40:06
>>m3kw9+(OP)
The article shows (fine tuned) Mistral 7B outperforming GPT-4, never mind GPT-3.5.
replies(1): >>m3kw9+L9
3. wongar+g5[view] [source] 2023-12-21 00:58:19
>>m3kw9+(OP)
Not everyone wants to send all their data to OpenAI or Microsoft. Sometimes it isn't legally possible even if you want to. And not every use-case is blessed with a permanent internet connection.

And for some use-cases, the "alignment" work on GPT 3.5 and 4 gets more in the way than it helps (even OpenAI admits that alignment makes the model perform worse, even on generic benchmarks).

◧◩
4. m3kw9+L9[view] [source] [discussion] 2023-12-21 01:43:12
>>coder5+b3
This model is not close to even 3.5 from when I used it. It first of all does not follow instructions properly and it just runs on and on
replies(1): >>coder5+xb
◧◩◪
5. coder5+xb[view] [source] [discussion] 2023-12-21 02:05:25
>>m3kw9+L9
What you're describing is the behavior you get from any base model that has not been instruction-tuned. The article is clear that this model is not for "direct use". It needs tuning for a specific application.
replies(1): >>m3kw9+Di
◧◩◪◨
6. m3kw9+Di[view] [source] [discussion] 2023-12-21 03:24:46
>>coder5+xb
how does one fine tune it to follow instructions? I would have thought they have open source training set for these instruction-follow fine tunes?
[go to top]