zlacker

[parent] [thread] 2 comments
1. aspenm+(OP)[view] [source] 2026-01-02 00:08:21
I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers?

If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs.

All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.

replies(1): >>bopbop+Q2
2. bopbop+Q2[view] [source] 2026-01-02 00:28:24
>>aspenm+(OP)
I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.
replies(1): >>aspenm+z7
◧◩
3. aspenm+z7[view] [source] [discussion] 2026-01-02 01:03:22
>>bopbop+Q2
I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.
[go to top]