We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested trading

So the results are meaningless - these LLMs have the advantage of foresight over historical data.

>>sethop+W
> We time segmented the APIs to make sure that the simulation isn’t leaking the future into the model’s context.

I wish they could explain what this actually means.

>>itake+g1
Overall, it does sound weird. On the one hand, assuming I properly I understand what they are saying is that they removed model's ability to cheat based on their specific training. And I do get that nuance ablation is a thing, but this is not what they are discussing there. They are only removing one avenue of the model to 'cheat'. For all we know, some that data may have been part of its training set already...

zlacker