There are concrete benchmarks like “how good is it at answering multiple choice questions accurately or “how good is it at producing valid code to solve a particular coding problem”.
There’s also a chatbot Elo ranking which crowd sources model comparisons https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...
GPT-4 is the king right now