In medical AI, where I'm currently working, "ground truth" is usually whatever human experts say about a medical image, and is rarely perfect. The goal is always to do better than whatever the current ground truth is.
But even when taking state of the art knowledge as a ground truth aligning to that is incredibly hard. Medicine is a great example. You're trying to create a causal graph in a highly noisy environment. You ask 10 doctors and you'll get 12 diagnoses. The problem is subtle things become incredibly important. Which is exactly what makes measurements so fucking hard. There is no state of the art in a well defined sense.
The point is that in most domains this is how things are. Even in programming.
Getting the right answer isn't enough