It's not obvious whether there's any automated way to reliably detect the difference between "use of HDR" and "abuse of HDR". But you could probably catch the most egregious cases, like "every single pixel in the video has brightness above 80%".
That sounds like a job our new AI overlords could probably handle. (But that might be overkill.)
My idea is: for each frame, grayscale the image, then count what percentage of the screen is above the standard white level. If more than 20% of the image is >SDR white level, then tone-map the whole video to the SDR white point.