Funny how if you kept reading before commenting, they addressed that point specifically
> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.