zlacker

not a bad model, becomes incoherent at above 8k token, and it's not helped by the fact that's very verbose, but seems very coherent and stay on topic closely until then: https://chat.openai.com/share/089d1b8c-3467-4c01-af9f-6568c0...

fails at math of course, even if the problem is very easy, like all mistrals. good for genration, probably not the best for RAG, there's mistral tunes that stay coherent to 16k tokens, and that cuts down chunking significanty

replies(2): >>jahsom+E9 >>Muffin+rs

>>averev+(OP)
~Is the 'k' in your token sizes a typo?~

Edit: mistook tokens for parameters for a moment there. Keeping up with AI jargon is exhausting for an idiot like me.

replies(1): >>averev+oa

>>jahsom+E9
No it's the sequence length like how much long is the string in the prompt so to say, 8192 token and it starts losing coherence and by 10000 tokens it was emitting gibberish, like empty lines and half words, I didn't put the worst part into the link. What do you mean by ELII?

replies(1): >>jahsom+ua

>>averev+oa
Explain like I'm an idiot :D

replies(1): >>averev+Ab

>>jahsom+ua
Ah hope it was clear enough in the answer :D

What your see in the link is the copy paste of a discussion between me and the model in question, that I pasted into gpt4 with the instructions to evaluate it.the answer with the votes in 10/10 is gpt evaluating the chart between me and the smaller model. The smaller model is producing the text after ASSISTANT, the question that I do as USER is part of a fixes script that I run with every new model so that I have a sort of a validation set before doing some more rigorous testing.

replies(1): >>jahsom+Ic

>>averev+Ab
Yes, indeed. Thank you for the additional context!

>>averev+(OP)
> fails at math of course

what did OpenAI do for the LLM to know "if given a math question, write Python for it, and run the code in order to get result" instead of trying to do the math itself?

replies(1): >>Me1000+Dt

>>Muffin+rs
It trained the model with a lot of data to write code instead (probably sandwiched between some special tokens like [run-python]. The LLM runner then takes the code, runs it in a sandbox, and feeds the output back into the prompt and lets GPT continue inferencing. But TL;DR: it trained the model to write code for math problems instead of trying to solve them itself.

replies(1): >>averev+0S

>>Me1000+Dt
It also has some training on problem decomposition. Many smaller models fail before writing the code, they fail when parsing the question.

You can ask them to serialized a problem in prolog, and see exactly when their understanding breaks - this is open hermes 2.5: https://pastebin.com/raw/kr62Hybq