Exploring the limits of large language models as quant traders

>>rzk+(OP)
The limits of LLM's for systematic trading were and are extremely obvious to anybody with a basic understanding of either field. You may as well be flipping a coin.

>>callam+S8
20 years ago NNs were considered toys and it was "extremely obvious" to CS professors that AI can't be made to reliably distinguish between arbitrary photos of cats and dogs. But then in 2007 Microsoft released Asirra as a captcha problem [0], which prompted research, and we had an AI solving it not that long after.

Edit - additional detail: The original Asirra paper from October 2007 claimed "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" [0]. It took Philippe Golle from Palo Alto a bit under a year to get "a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra" and "solve a 12-image Asirra challenge automatically with probability 10.3%" [1].

Edit 2: History is chock-full of examples of human ingenuity solving problems for very little external gain. And here we have a problem where the incentive is almost literally a money printing machine. I expect progress to be very rapid.

[0] https://www.microsoft.com/en-us/research/publication/asirra-...

[1] https://xenon.stanford.edu/~pgolle/papers/dogcat.pdf

>>falcor+Od
What makes trading such a special case is that as you use new technology to increase the capability of your trading system, other market participants you are trading against will be doing the same; it's a never-ending arms race.

>>lambda+5k
The only applications of generative AI I can envisage for trading, systematically or otherwise are the following:

  - data extraction: It's possible to get pretty good levels of accuracy on unstructured data, eg financial reports with relatively little effort compared to before decent llm's
   - sentiment analysis: Why bother with complicated sentiment analysis when you can just feed an article into an LLM for scoring?
   - reports: You could use it to generate reports on your financial performance, current positions etc
   - code: It can generate some code that might sometimes be useful in the development of a system

The issue is that these models don't really reason and they trade in what might as well be a random way. For example, a stock might have just dropped 5%. One LLM might say that we should buy the stock now and follow a mean reversion strategy. Another may say we should short the stock and follow the trend. The same LLM may give the same output on a different call. A miniscule difference in price, time or other data will potentially change the output when really a signal should be relatively robust.

And if you're going to tell the model say, 'we want to look for mean reversion opportunities' - then why bother with an LLM?

Another angle: LLM's are trained on the vast swathe of scammy internet content and rubbish in relation to the stock market. 90%+ of active retail traders lose money. If an llm is fed on losing / scammy rubbish, how could it possibly produce a return?

>>callam+uR
> If an llm is fed on losing / scammy rubbish, how could it possibly produce a return?

Rather than just relying on pretraining, you'd use RL on the trade outcomes.

>>falcor+mZ
RL would reasonably be expected to work if the market had some sort of discoverable static behavior.

The reason why RL by backtesting cannot work is that the real market is continuously changing, as all the agents within it, both human and automated, are constantly updating their opinions and strategies.

zlacker