zlacker

llama.cop Performs better than what?

3.5 Turbo, one of the 4.0 models, API or App?

replies(1): >>buggle+N

>>Whitne+(OP)
JSON mode and function-calling with a JSON schema in the OpenAI API.

replies(1): >>Whitne+V9

>>buggle+N
Right, but which model?

It makes a huge difference.

replies(1): >>buggle+Ha

>>Whitne+V9
I’ve been using OpenChat 3.5 1210 most recently. Before that, Mistral-OpenOrca. Both return JSON more consistently than gpt-3.5-turbo.

replies(1): >>airstr+Tc

>>buggle+Ha
gpt-3.5-turbo is not the benchmark

replies(1): >>buggle+lo

>>airstr+Tc
I don’t know what point you’re trying to make. They also return JSON more consistently than gpt-4, but I don’t use that because it’s overkill and expensive for my text extraction tasks.

replies(1): >>Whitne+5y

>>buggle+lo
Because people have different interests and want to hear your results for different reasons.

Some want to consider results relative to cost, and some are interested only in how it compares to SOTA.

replies(1): >>buggle+xJ

>>Whitne+5y
I mean, sure, but the parent should also just explicitly state what it is they were asking or claiming. I’ve answered every question asked. Making vague declarations about something not being “the benchmark,” while not stating what you think “the benchmark” should be, is unhelpful.