We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
> Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K each over 8 months of backtested trading

So the results are meaningless - these LLMs have the advantage of foresight over historical data.

>>sethop+W
> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.

>>PTRFRL+f1
I know very little about how the environment where they run these models look, but surely they have access to different tools like vector embeddings with more current data on various topics?

>>plufz+b2
you can (via the api, or to a lesser degree through the setting in the web client) determine what tools if any a model can use

>>discon+E6
But isn’t that more which MCP:s you can configure it to use? Do we have any idea which secret sauce stuff they have? Surely it’s not just a raw model that they are executing?

zlacker