>Sometimes it will spit out terrible horrid answers. I believe this might be due to time of the day/too many users. They limit tokens.
>Sometimes it will lie because it has alignment
>Sometimes I feel like it tests things on me
So, yes you are right, gpt4 is overall better, but I find myself using local models because I stopped trusting gpt4.
But finetuning on just a few tasks?
Depending on the task, it's totally reasonable to expect that a 7B model might eke out a win against stock GPT4. Especially if there's domain knowledge in the finetune, and the given task is light on demand for logical skills.
The best open source has to offer is Mixtral that will confidently make up a biography of a person it's never heard of before or write a script with nonexistant libraries.
[1]: https://twitter.com/RobLynch99/status/1734278713762549970
(Though with that said, the seasonal issue might be common to any LLM with training data annotated by time of year.)
There is nothing unreasonable about this. However I do dislike it when that information is presented in a fishy way, implying that it "outperforms GPT4" without any qualification.
It’s an argument they make at least as much to market fine tuning as their own model.
This is not a generic model that outperforms another generic model (GPT-4).
That can of course have useful applications because the resource/cost is then comparatively minuscule for certain business use cases.
their methodology also appears to be 'try 12 different models and hope 1 of them wins out.' multiple hypothesis adjustments come to mind here :)
Some of the things it said I’d done were genuinely good ideas, and I might actually go and do them at some point.
ChatGPT just said no.
Try a few blinds, mixtral 8x7b-instruct and gpt-4 are 50-50 for me, and it outperforms 3.5 almost every time, and you can run inference on it with a modern cpu and 64 GB of RAM on a personal device lmfao. and the instruct finetuning has had nowhere near the $$$ and rlhf that openai has. It's not a done deal, but people will be able to run models better than today's SOTA on <$1000 hardware in <3 months, I hope for their own sake that OpenAI is moving fast.
I don't think we will have an Open Source GPT4 for a long time so this is sorta clickbait, but for the small, specialized tasks, tuned on high quality data, we are already in the "Linux" era of OSS models. They can do real, practical work.
EDIT: Ok so the prompt and outputs are long enough that adding them to the post directly would be kind of onerous. But I didn't want to leave you waiting, so I copied an example into a Notion doc you can see here: https://opipe.notion.site/PII-Redaction-Example-ebfd29939d25...
Not according to my calculation. For low request rate it is likely more expensive than GPT4.
Can you recommend where I can learn more about hardware requirements for running Mistral/Mixtral?