If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.
Also, cool work, very happy to see actually good evaluations instead of just vibes or observational stuies that don't account for the Hawthorne effect
We'll be releasing anonymized data and some basic analysis code to replicate core results within the next few weeks (probably next, depending).
Our GitHub is here (http://github.com/METR/) -- or you can follow us (https://x.com/metr_evals) and we'll probably tweet about it.