zlacker

[parent] [thread] 0 comments
1. tempus+(OP)[view] [source] 2023-11-20 07:58:33
There are concrete benchmarks like “how good is it at answering multiple choice questions accurately or “how good is it at producing valid code to solve a particular coding problem”.

There’s also a chatbot Elo ranking which crowd sources model comparisons https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

GPT-4 is the king right now

[go to top]