We gave 5 LLMs $100K to trade stocks for 8 months

>>cheese+(OP)
I used to work for a brokerage API geared at algorithmic traders and in my experience anecdotal experience many strategies seem to work well when back-tested on paper but for various reasons can end up flopping when actually executed in the real market. Even testing a strategy in real time paper trading can end up differently than testing on the actual market where other parties are also viewing your trades and making their own responses. The post did list some potential disadvantages of backtesting, so they clearly aren't totally in the dark on it.

Deepseek did not sell anything, but did well with holding a lot of tech stocks. I think that can be a bit of a risky strategy with everything in one sector, but it has been a successful one recently so not surprising that it performed well. Seems like they only get to "trade" once per day, near the market close, so it's not really a real time ingesting of data and making decisions based on that.

What would really be interesting is if one of the LLMs switched their strategy to another sector at an appropriate time. Very hard to do but very impressive if done correctly. I didn't see that anywhere but I also didn't look deeply at every single trade.

>>naet+97
I've honestly never understood what backtesting even does because of the things you mention like time it takes to request and close trades (if they even do!), responses to your trades, the continuous and dynamic input of the market into your model, etc.

Is there any reference that explains the deep technicalities of backtesting and how it is supposed to actually influence your model development? It seems to me that one could spend a huge amount of effort on backtesting that would distract from building out models and tooling and that that effort might not even pay off given that the backtesting environment is not the real market environment.

>>bmitc+Np
We use back testing at my firm for two primary reasons, one as a way to verify correctness and two as a way to assess risk.

We do not use it as a way to determine profitability.

>>Maxata+6e2
This is interesting because I'm not immediately sure how you verify correctness and assess risk without also addressing profitability.

By assessing risk is that just checking that it does dump all your money and that you can at least maintain a stable investment cache?

Are you willing to say more about correctness? Is the correctness of the models, of the software, or something else?

>>bmitc+h03
Profitability is not in any way considered a property of the correctness of an algorithm. An algorithm can be profitable and incorrect, and an algorithm can be correct but not profitable.

Correctness has to do with whether the algorithm performed the intended actions in response to the inputs/events provided to it, nothing more. For the most part correctness of an algorithm can be tested the same way most software is tested, ie. unit tests, but it's also worth testing the algorithm using live data/back testing it since it's not feasible to cover every possible scenario in giant unit tests, but you can get pretty good coverage of a variety of real world scenarios by back testing.

zlacker