So the first problem seems to be classifying user-flagged comments based on rules violations.
And if this ML labeling is successful then do all the unflagging, or whatever is the most easily automated , most frequent action, to reduce the queue for manual processing.