zlacker

Exploring the limits of large language models as quant traders

submitted by rzk+(OP) on 2025-11-19 07:36:25 | 137 points 99 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
1. kqr+X3[view] [source] 2025-11-19 08:20:51
>>rzk+(OP)
Super interesting! You can click the "live" link in the header to see how they performed over time. The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital.

Although I lack the maths to determine it numerically (depends on volatility etc.), it looks to me as though all six are overbetting and would be ruined in the long run. It would have been interesting to compare against a constant fraction portfolio that maintains 1/6 in each asset, as closely as possible while optimising for fees. (Or even better, Cover's universal portfolio, seeded with joint returns from the recent past.)

I couldn't resist starting to look into it. With no costs and no leverage, the hourly rebalanced portfolio just barely outperforms 4/6 coins in the period: https://i.xkqr.org/cfportfolio-vs-6.png. I suspect costs would eat up many of the benefits of rebalancing at this timescale.

This is not too surprising, given the similiarity of coin returns. The mean pairwise correlation is 0.8, the lowest is 0.68. Not particularly good for diversification returns. https://i.xkqr.org/coinscatter.png

> difficulty executing against self-authored plans as state evolves

This is indeed also what I've found trying to make LLMs play text adventures. Even when given a fair bit of help in the prompt, they lose track of the overall goal and find some niche corner to explore very patiently, but ultimately fruitlessly.

◧◩
29. falcor+Od[view] [source] [discussion] 2025-11-19 09:48:24
>>callam+S8
20 years ago NNs were considered toys and it was "extremely obvious" to CS professors that AI can't be made to reliably distinguish between arbitrary photos of cats and dogs. But then in 2007 Microsoft released Asirra as a captcha problem [0], which prompted research, and we had an AI solving it not that long after.

Edit - additional detail: The original Asirra paper from October 2007 claimed "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" [0]. It took Philippe Golle from Palo Alto a bit under a year to get "a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra" and "solve a 12-image Asirra challenge automatically with probability 10.3%" [1].

Edit 2: History is chock-full of examples of human ingenuity solving problems for very little external gain. And here we have a problem where the incentive is almost literally a money printing machine. I expect progress to be very rapid.

[0] https://www.microsoft.com/en-us/research/publication/asirra-...

[1] https://xenon.stanford.edu/~pgolle/papers/dogcat.pdf

◧◩◪◨⬒⬓
68. falcor+FQ[view] [source] [discussion] 2025-11-19 14:40:18
>>nl+aC
Ok, I'll take it. It definitely wasn't a business call at the level of Microsoft saying that everyone should be using it, but it was an actual service offered under the Microsoft umbrella and used by many sites in the wild, e.g. via this MediaWiki extension [0], for 8 years [1].

[0] https://www.mediawiki.org/wiki/Extension:Asirra

[1] https://web.archive.org/web/20150207180225/https%3A//researc...

◧◩◪◨
75. Lapsa+391[view] [source] [discussion] 2025-11-19 16:05:39
>>ritonl+0g
sorry dude. tried going down the rabbit hole but I'm too lazy and uninterested in it. read about it month ago or so. perhaps Daily News Sentiment Index uses LLMs, not sure. if you go long enough through https://quantocracy.com/ you should be able to find it
◧◩
83. riboso+eu2[view] [source] [discussion] 2025-11-19 22:49:51
>>aswegs+e9
About that...

https://www.reddit.com/r/ClaudePlaysPokemon/comments/1otd4kl...

seems like the big issue is just spending time with the tooling to interact with Pokemon and just that calling an LLM for each button is time consuming.

◧◩◪◨
98. Lapsa+mBk[view] [source] [discussion] 2025-11-26 10:26:21
>>ritonl+0g
found one such index https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5763042 called Populism Index (POP) built from Wall Street Journal articles (not sure how publicly accessible it is)
[go to top]