We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested trading

So the results are meaningless - these LLMs have the advantage of foresight over historical data.

>>sethop+W
> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.

>>PTRFRL+f1
Even if it is after the cut off date wouldn't the models be able to query external sources to get data that could positively impact them? If the returns were smaller I could reasonably believe it but beating the S&P500 returns by 4x+ strains credulity.

zlacker