To me it feels like it detects if the answer could be answered cheaper by code interpreter model or 4 Turbo and then it offloads them to that and they just kinda suck compared to OG 4.
I’ve watched it fumble and fail to solve a problem with CI, took it 3 attempts over 5 minutes real time and just gave up in the end, a problem that OG 4 can do one shot no preamble.