zlacker

There are concrete benchmarks like “how good is it at answering multiple choice questions accurately or “how good is it at producing valid code to solve a particular coding problem”.

There’s also a chatbot Elo ranking which crowd sources model comparisons https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

GPT-4 is the king right now