We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
> Grok ended up performing the best while DeepSeek came close to second. Almost all the models had a tech-heavy portfolio which led them to do well. Gemini ended up in last place since it was the only one that had a large portfolio of non-tech stocks.

I'm not an investor or researcher, but this triggers my spidey sense... it seems to imply they aren't measuring what they think they are.

>>bcrosb+l2
Yeah I mean if you generally believe the tech sector is going to do well because it has been doing well you will beat the overall market. The problem is that you don’t know if and when there might be a correction. But since there is this one segment of the overall market that has this steady upwards trend and it hasn’t had a large crash, then yeah any pattern seeking system will identify “hey this line keeps going up!” Would it have the nuance to know when a crash is coming if none of the data you test it on has a crash?

It would almost be more interesting to specifically train the model on half the available market data, then test it on another half. But here it’s like they added a big free loot box to the game and then said “oh wow the player found really good gear that is better than the rest!”

Edit: from what I causally remember a hedge fund can beat the market for 2-4 years but at 10 years and up their chances of beating the market go to very close to zero. Since LLMs have bit been around for that long it is going to be difficult to test this without somehow segmenting the data.

>>IgorPa+Z3
> It would almost be more interesting to specifically train the model on half the available market data, then test it on another half.

Yes, ideally you’d have a model trained only on data up to some date, say January 1, 2010, and then start running the agents in a simulation where you give them each day’s new data (news, stock prices, etc.) one day at a time.

>>tshadd+H7
I suspect trading firms have already done this to the maximum extent that it's profitable to do so. I think if you were to integrate LLMs into a trading algorithm, you would need to incorporate more than just signals from the market itself. For example, I hazard a guess you could outperform a model that operates purely on market data with a model that also includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs.

>>hxtk+HC
"includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs."

Not really. Sentiment analysis in social networks has been around for years. It's probably cheaper to by that analysis and feed it to LLMs than to have LLMs do it.

zlacker