We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
> Grok ended up performing the best while DeepSeek came close to second. Almost all the models had a tech-heavy portfolio which led them to do well. Gemini ended up in last place since it was the only one that had a large portfolio of non-tech stocks.

I'm not an investor or researcher, but this triggers my spidey sense... it seems to imply they aren't measuring what they think they are.

>>bcrosb+l2
Yeah I mean if you generally believe the tech sector is going to do well because it has been doing well you will beat the overall market. The problem is that you don’t know if and when there might be a correction. But since there is this one segment of the overall market that has this steady upwards trend and it hasn’t had a large crash, then yeah any pattern seeking system will identify “hey this line keeps going up!” Would it have the nuance to know when a crash is coming if none of the data you test it on has a crash?

It would almost be more interesting to specifically train the model on half the available market data, then test it on another half. But here it’s like they added a big free loot box to the game and then said “oh wow the player found really good gear that is better than the rest!”

Edit: from what I causally remember a hedge fund can beat the market for 2-4 years but at 10 years and up their chances of beating the market go to very close to zero. Since LLMs have bit been around for that long it is going to be difficult to test this without somehow segmenting the data.

>>IgorPa+Z3
> It would almost be more interesting to specifically train the model on half the available market data, then test it on another half.

Yes, ideally you’d have a model trained only on data up to some date, say January 1, 2010, and then start running the agents in a simulation where you give them each day’s new data (news, stock prices, etc.) one day at a time.

>>tshadd+H7
I mean ultimately this is an exercise in frustration because if you do that you will have trained your model on market patterns that might not be in place anymore. For example after the 2008 recession regulations changed. So do market dynamics actually work the same in 2025 as in 2005? I honestly don’t know but intuitively I would say that it is possible that they do not.

I think a potentially better way would be to segment the market up to today but take half or 10% of all the stocks and make only those available to the LLM. Then run the test on the rest. This accounts for rules and external forces changing how markets operate over time. And you can do this over and over picking a different 10% market slice for training data each time.

But then your problem is that if you exclude let’s say Intel from your training data and AMD from your testing data then there ups and downs don’t really make sense since they are direct competitors. If you separate by market segment then does training the model on software tech companies might not actually tell you accurately how it would do for commodities or currency training. Or maybe I am wrong and trading is trading no matter what you are trading.

>>IgorPa+9g
> you will have trained your model on market patterns that might not be in place anymore

My working definition of technical analysis [0]

[0]: https://en.wikipedia.org/wiki/Technical_analysis

>>chris_+kj
It is always fun (in a broad sense of that word) when I make a comment on an industry I know nothing about and somehow stumble onto a thing that not only has a name but also research. I am sure there is a German word for that feel of discovering something that countless others have already discovered.

>>IgorPa+8k
Any time I invent a cool thing, I go and try and find it online. Usually it's already an established product, which totally validates my feeling that the thing I invented is cool and would be a good product. :D

Occasionally it's (as far as I can tell) a legitimately new 'wow that's obvious' style thing and I consider prototyping it. :)

>>taneq+Bt
What have you prototyped recently? Anything you have released to market? I'm in the same general area by am teetering on actually launching products wouldn't mind connecting with a like minded e gineer

zlacker