zlacker

[return to "Mistral 7B Fine-Tune Optimized"]
1. averev+Lo[view] [source] 2023-12-20 22:13:37
>>tosh+(OP)
not a bad model, becomes incoherent at above 8k token, and it's not helped by the fact that's very verbose, but seems very coherent and stay on topic closely until then: https://chat.openai.com/share/089d1b8c-3467-4c01-af9f-6568c0...

fails at math of course, even if the problem is very easy, like all mistrals. good for genration, probably not the best for RAG, there's mistral tunes that stay coherent to 16k tokens, and that cuts down chunking significanty

◧◩
2. jahsom+py[view] [source] 2023-12-20 23:17:33
>>averev+Lo
~Is the 'k' in your token sizes a typo?~

Edit: mistook tokens for parameters for a moment there. Keeping up with AI jargon is exhausting for an idiot like me.

◧◩◪
3. averev+9z[view] [source] 2023-12-20 23:22:30
>>jahsom+py
No it's the sequence length like how much long is the string in the prompt so to say, 8192 token and it starts losing coherence and by 10000 tokens it was emitting gibberish, like empty lines and half words, I didn't put the worst part into the link. What do you mean by ELII?
◧◩◪◨
4. jahsom+fz[view] [source] 2023-12-20 23:23:15
>>averev+9z
Explain like I'm an idiot :D
◧◩◪◨⬒
5. averev+lA[view] [source] 2023-12-20 23:30:38
>>jahsom+fz
Ah hope it was clear enough in the answer :D

What your see in the link is the copy paste of a discussion between me and the model in question, that I pasted into gpt4 with the instructions to evaluate it.the answer with the votes in 10/10 is gpt evaluating the chart between me and the smaller model. The smaller model is producing the text after ASSISTANT, the question that I do as USER is part of a fixes script that I run with every new model so that I have a sort of a validation set before doing some more rigorous testing.

[go to top]