Exploring the limits of large language models as quant traders

>>rzk+(OP)
Super interesting! You can click the "live" link in the header to see how they performed over time. The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital.

Although I lack the maths to determine it numerically (depends on volatility etc.), it looks to me as though all six are overbetting and would be ruined in the long run. It would have been interesting to compare against a constant fraction portfolio that maintains 1/6 in each asset, as closely as possible while optimising for fees. (Or even better, Cover's universal portfolio, seeded with joint returns from the recent past.)

I couldn't resist starting to look into it. With no costs and no leverage, the hourly rebalanced portfolio just barely outperforms 4/6 coins in the period: https://i.xkqr.org/cfportfolio-vs-6.png. I suspect costs would eat up many of the benefits of rebalancing at this timescale.

This is not too surprising, given the similiarity of coin returns. The mean pairwise correlation is 0.8, the lowest is 0.68. Not particularly good for diversification returns. https://i.xkqr.org/coinscatter.png

> difficulty executing against self-authored plans as state evolves

This is indeed also what I've found trying to make LLMs play text adventures. Even when given a fair bit of help in the prompt, they lose track of the overall goal and find some niche corner to explore very patiently, but ultimately fruitlessly.

replies(5): >>falcor+Gc >>fragme+Ce >>michae+HU >>ponect+OE1 >>pants2+dO1

>>rzk+(OP)
I don't think betting on crypto is really playing to the strengths of the models. I think giving news feeds and setting it on some section of the S&P 500 would be a better evaluation.

>>rzk+(OP)
Isn’t that what Renaissance Technology does?

replies(1): >>chroni+Zi

>>rzk+(OP)
You don't actually need nanosecond latency to trade effectively in futures markets but it does help to be able to evaluate and make decisions in the single-digit milliseconds range. Almost no generative model is able to perform inference at this latency threshold.

A threshold in the single-digit milliseconds range allows the rapid detection of price reversals (signaling the need to exit a position with least loss) in even the most liquid of real futures contracts (not counting rare "flash crash" events).

replies(2): >>vita77+R8 >>graeme+Q9

>>rzk+(OP)
>>LLMs are achieving technical mastery in problem-solving domains on the order of Chess and Go, solving algorithmic puzzles and math proofs competitively in contests such as the ICPC and IMO.

I don't think LLMs are anywhere close to "mastery" in chess or go. Maybe a nitpick but the point is that a NN created to be good at trading is likely to outperform LLMs at this task the same way way NNs created specifically to be good at board games vastly outperform LLMs at those games.

replies(1): >>lukan+K9

>>rzk+(OP)
Are language models really the best choice for this?

Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as

> We also found that the models were highly sensitive to seemingly trivial prompt changes

replies(2): >>baq+x7 >>kqr+wc

>>Havoc+57
they're tools. treat them as tools.

since they're so general, you need to explore if and how you can use them in your domain. guessing 'they're poorly suited' is just that, guessing. in particular:

> We also found that the models were highly sensitive to seemingly trivial prompt changes

this is as much as obvious for anyone who seriously looked at deploying these, that's why there are some very successful startups in the evals space.

replies(2): >>rob_c+7a >>infect+ou2

>>rzk+(OP)
you simply will lose trading directly with an llm. mapping the dislocation by estimating the percentage of llm trading bots is useful though.

>>rzk+(OP)
This is very thoughtful and interesting. It's worth noting that this is just a start and in future iterations they're planning to give the LLMs much more to work with (e.g. news feeds). It's somewhat predictable that LLMs did poorly with quantitative data only (prices) but I'm very curious to see how they perform once they can read the news and Twitter sentiment.

replies(2): >>rob_c+y9 >>Lapsa+va

>>ezekie+55
This is true for some classes of strategies. At the same time there are strategies that can be profitable on longer timeframes. The two worlds are not mutually exclusive.

replies(1): >>rob_c+ga

>>rzk+(OP)
The limits of LLM's for systematic trading were and are extremely obvious to anybody with a basic understanding of either field. You may as well be flipping a coin.

replies(5): >>rob_c+H9 >>kqr+Mb >>falcor+Od >>Saline+nl >>ta1265+jK

>>rzk+(OP)
Given that LLMs can't even finish Pokemon Red, how would you expect they are able to trade futures?

replies(6): >>wild_p+bb >>falcor+Wb >>agenti+od >>Saline+Hl >>eru+CY >>riboso+eu2

>>vita77+F8
Not just can i guarantee the models are bad with numbers, unless it's a highly tuned and modified version they're too slow for this arena. Stick to using attention transformers in better model designs which have much lower latencies than pre-trained llms...

>>callam+S8
At least a coin is faster and more reliable.

>>blueca+n5
"Maybe a nitpick but the point is that a NN created to be good at trading is likely to outperform LLMs at this task the same way way NNs created specifically to be good at board games vastly outperform LLMs at those games."

Disagree. Go and chess are games with very limited rules. Succesful trading on the other hand is not so much a arbitary numbers game, but involves analyzing events in the news happening right now. Agentic LLMs that do this and accordingly buy and sell might succeed here.

(Not what they did here, though

"For the first season, they are not given news or access to the leading “narratives” of the market.")

>>ezekie+55
From the article:

> The models engage in mid-to-low frequency trading (MLFT) trading, where decisions are spaced by minutes to a few hours, not microseconds. In stark contrast to high-frequency trading, MLFT gets us closer to the question we care about: can a model make good choices with a reasonable amount of time and information?

>>baq+x7
> guessing 'they're poorly suited' is just that, guessing

I have a really nice bridge to sell you...

This "failure" is just a grab at trying to look "cool" and "innovative" I'd bet. Anyone with a modicum of understanding of the tooling (or hell experience they've been around for a few years now, enough for people to build a feeling for this), knows that this it's not a task for a pre-trained general LLM.

replies(1): >>baq+dA

>>vita77+R8
Yes, but LLM can barely cope with following the ordering of complex software tutorials linearly. Why would you reasonably expect them unprompted to understand time any better enough to trade and turn a profit?

replies(1): >>vita77+eu

>>vita77+F8
I would argue that sentiment classification is where LLMs perform best. folks are already using it for precisely such purpose - have even built a public index out of it

replies(1): >>ritonl+0g

>>aswegs+e9
Hey! That wasn't easy!

>>rzk+(OP)
Cool experiment, but it’s nothing more than a random walk.

>>callam+S8
I agree. Plus it's way too short a timeframe to evaluate any trading activity seriously.

But I still think the experiment is interesting because it gives us insight into how LLMs approach risk management, and what effects on that we can have with prompting.

>>aswegs+e9
(Unless you're a marketer) It makes a lot more sense to build a benchmark before the capabilities are there.

>>Havoc+57
No, LLMs are not a good choice for this – as the results show! If I had to guess, they're experimenting with LLMs for publicity.

replies(1): >>Libidi+5v

>>kqr+X3
Agreed, and I'd also love to see a baseline of human performance here, both of experienced quant traders and of fresh grads who know the theory but never did this sort of trading and aren't familiar with the crypto futures market.

replies(1): >>spacem+Cy

>>rzk+(OP)
LLM's can do language but not much else, not poker, not trading and definitely no intelligence

replies(1): >>Drakim+RC

>>rzk+(OP)
At the end of the day it all comes down to input data. There are a lot of things you can do to collect proprietary data to give you an edge.

replies(1): >>GaryNu+1z

>>aswegs+e9
i always felt that emotions, instincts, fear, greed, courage, pain are elements of a self-aware conscious loop system that can't be replicated accurately in a digital system and that a seasoned successful traders realize and utilize that the activity is largely is a psychological one. I'm not talking about neutral plays where you can absorb market fluctuations in the short term to extract 1~2% a week but directional trades that almost all traders play (regardless of how what exotic option strategies they are employing).

also the other curious nature of the markets is its ability to destroy any persistent trading system by reverting to its core stochastic properties and its constant ebb and flow from stability to instability that crescendos into systematic instability that rewrite the rules all over again.

ive tried all sorts of ways to do this and without being a large institution and being able to absorb the noise for neutral or legal quasi insider trading via proximity, for the average joe the emotional/psychological hardness you need to survive and be in the <1% of traders is simply too much, its not unlike any other sports or arts, many dream the dream but only few get interviewed and written about.

rather i think to myself the best trade is the simplest one: buy shares or invest in a business with money or time (strongly recommend against using this unless you have no other means) and sell it at a higher price or maintain a long term DCF from a business you own as leverage/collateral to arbitrage whatever rate your central bank sets on assets in demand or will be in demand.

to me its clear where LLM fits and doesn't but ultimately it cannot, will not, must not replace your own agency.

>>callam+S8
20 years ago NNs were considered toys and it was "extremely obvious" to CS professors that AI can't be made to reliably distinguish between arbitrary photos of cats and dogs. But then in 2007 Microsoft released Asirra as a captcha problem [0], which prompted research, and we had an AI solving it not that long after.

Edit - additional detail: The original Asirra paper from October 2007 claimed "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" [0]. It took Philippe Golle from Palo Alto a bit under a year to get "a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra" and "solve a 12-image Asirra challenge automatically with probability 10.3%" [1].

Edit 2: History is chock-full of examples of human ingenuity solving problems for very little external gain. And here we have a problem where the incentive is almost literally a money printing machine. I expect progress to be very rapid.

[0] https://www.microsoft.com/en-us/research/publication/asirra-...

[1] https://xenon.stanford.edu/~pgolle/papers/dogcat.pdf

replies(2): >>lambda+5k >>nl+Cl

>>kqr+X3
> find some niche corner to explore very patiently, but ultimately fruitlessly.

What, so they're better at my hobbies than me? Someone give Claude a 3d printer!

>>rzk+(OP)
I was chatting to a friend in the space. This guy is both experienced in trading and LLMs, and has gone all-in on using LLMs to get his day-to-day coding done. Now he's working on the model to end all models, which is a fairly ambitious way to put it, but it throws off some interesting conversations.

You need domain knowledge to get this to work. Things like "we fed the model the market data" are actually non-obvious. There might be more than one way to pre-process the data, and what the model sees will greatly affect what actions it comes up with. You also have to think about corner cases, eg when AlphaZero was applied to StarCraft, they had to give it some restrictions on the action rate, that kind of thing. Otherwise the model gets stuck in an imaginary money fountain.

But yeah, the AI thing hasn't passed by the quant trading community. A lot of things going on with AI trading teams being hired in various shops.

replies(3): >>JumpCr+og >>Libidi+Az >>beefnu+1k6

>>Lapsa+va
what index ?

replies(2): >>Lapsa+391 >>Lapsa+mBk

>>lordna+Te
> There might be more than one way to pre-process the data

I'm honestly more hopeful about AI replacing this process than the core algorithmic component, at least directly. (AI could help write the latter. But it's immediately useful for the former.)

>>jwpapi+P4
> Isn’t that what Renaissance Technology does?

No.

replies(1): >>ta1265+2M

>>rzk+(OP)
Even ChatGPT knows why LLMs for quant trading would never work.

>>falcor+Od
What makes trading such a special case is that as you use new technology to increase the capability of your trading system, other market participants you are trading against will be doing the same; it's a never-ending arms race.

replies(3): >>jstanl+5n >>ta1265+OK >>callam+uR

>>callam+S8
So what are the limits, given that you seem knowledgeable about it?

replies(1): >>red-ir+IR

>>falcor+Od
The Asirra paper isn't from a ML research group. The statement: "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" is just a statement of fact - it wasn't any forms of prediction.

If you read the paper you note that they surveyed researchers about the current state of the art ("Based on a survey of machine vision literature and vision ex- perts at Microsoft Research, we believe classification accuracy of better than 60% will be difficult without a significant advance in the state of the art.") and noted what had been achieved as PASCAL 2006 ("The 2006 PASCAL Visual Object Classes Challenge [4] included a competition to identify photos as containing several classes of objects, two of which were Cat and Dog. Although cats and dogs were easily distinguishable from other classes (e.g., “bicycle”), they were frequently confused with each other.)

I was working in an adjacent field at the time. I think the general feeling was that advances in image recognition were certainly possible, but no one knew how to get above the 90% accuracy level reliably. This was in the day of hand coded (and patented!) feature extractors.

OTOH, stock market prediction via learning methods has a long history, and plenty of reasons to think that long term prediction is actually impossible. Unlike vision systems there isn't another thing that we can point to to say that "it must be possible" and in this case we are literally trying to predict the future.

Short term prediction works well in some cases in a statistical sense, but long term isn't something that new technology seems likely to solve.

replies(1): >>falcor+jw

>>aswegs+e9
Because trading is mainly number-based, unlike Pokemon Red?

replies(1): >>termin+It

>>lambda+5k
That doesn't mean it doesn't work. That means it does work!

If other market participants chose not to use something then that would show that it doesn't work.

>>rzk+(OP)
Crazy how people continue to treat LLMs like they’re anything more than a record of past human knowledge and are then surprised when they can’t predict the future.

replies(1): >>EMM_38+nY1

>>Saline+Hl
I'll bite: What part of the game, which is encoded entirely by a finite set of numbers, takes input as numbers, provides output as numbers, and is processed by a CPU that acts in a discrete digital space, cannot be represented by numbers?

replies(1): >>esseph+FV

>>rob_c+ga
My comment makes no such claim. I wrote about different timeframes that trading strategies operate on.

replies(1): >>rob_c+etx

>>kqr+wc
Exactly. This is a performance by a really bad method actor.

>>rzk+(OP)
Today it's clear that there are limitations to LLM's.

But I also see this incredible growth curve to LLM's improvement. 2 years ago, I wouldn't expect llm's to one shot a web application or help me debug obscure bugs and 2 years later I've been proven wrong.

I completely believe that trading is going to be saturated with ai traders in the future. And being able to predict and detect ai trading patterns is going to be an important leverage for human traders if they'll still exist

replies(2): >>thunky+cz >>ta1265+iL

>>rzk+(OP)
. . . "The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital."

Proves that LLM's are nowhere near close to AGI.

replies(1): >>sd9+jy

>>nl+Cl
Maybe I misunderstand, but it seems that there's nothing in your comment that contradicts any aspect of mine.

Regarding image classification. As I see it, a company like Microsoft surveying researchers about the state of the art and then making a business call to recommend the use of it as a captcha is significantly more meaningful of a prediction than any single paper from an ML research group. My intent was just to demonstrate that it was widely considered to be a significant open problem, which it clearly was. That in turn led to wider interest in solving it, and it was solved soon after - much faster than expected by people I spoke to around that time.

Regarding stock market prediction, of course I'm not claiming that long term prediction is possible. All I'm saying is that I don't see a reason why quant trading could be used as a captcha - it's as pure a pattern matching task as could be, and if AIs can employ all the context and tooling used by humans, I would expect them to be at least as good as humans within a few years. So my prediction is not the end of quant trading, but rather that much of the work of quants would be overtaken by AIs.

Obviously a big part of trading at the moment is already being done by AIs, so I'm not making a particularly bold claim here. What I'm predicting (and I don't believe that anyone in the field would actually disagree) is that as tech advances, AIs will be given control of longer trading time horizons, moving from the current focus on HFT to day trading and then to longer term investment decisions. I believe that there will still be humans in the loop for many many years, but that these humans would gradually turn their focus to high level investment strategy rather than individual trades.

replies(1): >>nl+aC

>>rzk+(OP)
Hyperliquid now has select tokenized equities as well. Would love to see how these models perform when trading equities

I've been following these for a while and many of the trades taken by DeepSeek and Qwen were really solid

>>Diving+Pv
The vast majority of intelligent humans cannot profitably trade on intraday timeframes

>>falcor+Gc
As someone who trades crypto semi-professionally, this was one of the toughest trading periods I've ever seen and included a massive liquidation event on 10th of October that wiped out over $20B in capital. Any trader who broke even in this period likely outperformed. I know some very, very good traders who got wiped out on leverage on 10th of October when stop losses didn't trigger and prices plummetted to 2021 levels (still no clarity why).

BTC also performed abysmally during this period with a sustained chop down from $126k to $90k.

replies(1): >>kqr+wA

>>lvl155+Uc
That's funny because that advice is _directly_ counter to what most HFT quants say

replies(1): >>lvl155+wz

>>binsqu+7v
> I completely believe that trading is going to be saturated with ai traders in the future

That's probably good news for us index fund investors. We need people to believe they're going to beat the market.

>>GaryNu+1z
Right, because they will tell you exactly how they generate alpha for all the world to see. It’s worth mentioning quant is not all HFT.

>>lordna+Te
You can vibe code in this space as an individual because practically everything you are going to write is already in the training data.

The big Quant hedge funds have been using machine learning for decades. I took the coursera RL in finance class years ago.

The idea you are going to beat Two Sigma at their own game with tokens is just an absurdity.

Personally, I think any individual on their own that claims they are doing anything in the algorithmic / ML high frequency space is full of shit.

I could talk like I am too and sound really impressive to someone outside the space. That is much different though than actually making money on what you claim you are doing.

It reminds me of an artist friend when I was younger. She was an artist and I quite liked her paintings. She would tell everyone she is an artist. She was also an encyclopedia when it came to anything art related. She wasn't actually selling much art though. She lived off the $10k a month allowance her rich father gave her. She wasn't even being dishonest but when you didn't know the full picture a person would just assume she was living off her art sales.

replies(4): >>ta1265+TJ >>lordna+LM >>pants2+LO1 >>potato+3U3

>>rob_c+7a
I think you have a different idea of what I'm saying than what I'm actually saying.

>>spacem+Cy
Note that 10th of October is before the trading period in this experiment. If anything, autoregression over shorter timescales would suggest entering after 10th of October being a good idea!

replies(1): >>spacem+sA1

>>falcor+jw
> making a business call to recommend the use of it as a captcha is significantly more meaningful of a prediction than any single paper from an ML research group.

That's not what this is. It's a research paper from 3 researchers at MSR.

replies(1): >>falcor+FQ

>>p1dda+Nc
Language is powerful.

Language can do poker, trading, and other intelligent activities.

>>rzk+(OP)
LLMs are very good at NLP/classification tasks and weak at calculations and numbers. So, I doubt feeding it numerical data is a good idea.

And if you feeding or harnessing as the blog post puts it in a way that where it reasons things like:

> RSI 7-period: 62.5 (neutral-bullish)

Then it is no better than normal automated trading where the program logic is something along the lines if RSI > 80 then exit. And looking at the reasoning trace that is what the model is doing.

> BTC breaking above consolidation zone with strong momentum. RSI at 62.5 shows room to run, MACD positive at 116.5, price well above EMA20. 4H timeframe showing recovery from oversold (RSI 45.4). Targeting retest of $110k-111k zone. Stop below $106,361 protects against false breakout.

My understanding is that technical trading using EMA/timeframes/RSI/MACD etc is big in crypto community. But to automate it you can simply write python code.

I don't know if this is a good use of LLMs. Seems like an overkill. Better use case might have been to see if it can read sentiments from Twitter or something.

replies(2): >>ta1265+RG >>ramoz+pC2

>>thisis+mE
>>But to automate it you can simply write python code.

haha, if it would be that easy, most of them would do this? :-D

The thing is - its fucking complicated and most people will give up far before they enter any level of operational capability.

I've developed such a system for myself and Im running it in production (though, not with crypto): And whilte most people will see the complexity in "whatever trading magic you apply", its QUITE the opposite:

- the trading logic itself is simple, its ~ 300 lines

- whats not simple is the part of everything else in the context of "asset management", you need position tracking, state management (orders and positions and account etc.), you need to be able to pour in whatever new quotedata for whatever new assete you identify, the system needs to be stable to work in "mass mode" and be super robust as data provider quality is volatile; you need some type of accounting logic on your side; you need a very capable reporting engine (imagine managing 200 positions simultaneously), I could enlength this list more or less unlimited.

There is MUCH MORE in such an application than the question of "when and how do I trade" - my systems raw source is around 2 MB by today, 3rd party libs and OSS libs not included.

replies(1): >>thisis+7L

>>Libidi+Az
>> Personally, I think any individual on their own that claims they are doing anything in the algorithmic / ML high frequency space is full of shit. <<

do you want to have a chat by Whatsapp then I can show you quite the opposite! :-) And in my case: Nobody knows, only one friend who is also deep in the stuff; people doing this are usually more quiet, since nobody is interested at all. I have some contacts in academia and shared my ideas with them - none of them said: "this wont work"

(Disclaimer: 25+y IT experience, 15 of them in finance)

>>callam+S8
In general, I agree - but there is one exception, I think: However you put AI into an stat arb context, I think it may help for trading on a daily base like "tell me where i should enter this morning and exit this evening". (not daytrading throughout the whole day)

But, I havent tested it so far since I do not believe it either :D

replies(1): >>xwolfi+nD3

>>lambda+5k
Good one! The thing is, you are assuming "perfect/symmetric distribution" of all known/available technologies across all market participants - this far off the reality. Sure: Jane Street et al are on the same level, but the next big buckets are a huge variety of trading shops doing whatever proprietary stuff to get their cut; most of them may be aware of the latest buzz, but just dont deploy it et.

>>ta1265+RG
You seem to be debating a point which was never made by holding on to one word - simple. I didn't say trading code is simple neither I did say that your trading code setup is simple.

Still let me clarify - the trading logic as you say is simple and just 300 lines. That is what LLMs seem to be doing in part in the post. The point I made is that doesn't seem to be a good use case for LLMs given that everything costs token. IMO, you could run this in your complex application without spending that much money on tokens.

If you can explain why original opinion of wasting tokens on something which can "simply" be done in python is wrong, I am all ears.

>>binsqu+7v
..though even until lately, none of them could tell me how to fix the Azure bug I have with my account: It does not allow me to spin up new machines and shouts an obscure error message :-D

>>chroni+Zi
++1

>>Libidi+Az
> Personally, I think any individual on their own that claims they are doing anything in the algorithmic / ML high frequency space is full of shit.

Well I'm in the space, but I've come across more than one guy who discovered a money making algo, all on their own, with all the right ideas but without the industry standard terms for them.

All logic would suggest this shouldn't be possible, but what I've seen is what I've seen.

>>nl+aC
Ok, I'll take it. It definitely wasn't a business call at the level of Microsoft saying that everyone should be using it, but it was an actual service offered under the Microsoft umbrella and used by many sites in the wild, e.g. via this MediaWiki extension [0], for 8 years [1].

[0] https://www.mediawiki.org/wiki/Extension:Asirra

[1] https://web.archive.org/web/20150207180225/https%3A//researc...

>>lambda+5k
The only applications of generative AI I can envisage for trading, systematically or otherwise are the following:

  - data extraction: It's possible to get pretty good levels of accuracy on unstructured data, eg financial reports with relatively little effort compared to before decent llm's
   - sentiment analysis: Why bother with complicated sentiment analysis when you can just feed an article into an LLM for scoring?
   - reports: You could use it to generate reports on your financial performance, current positions etc
   - code: It can generate some code that might sometimes be useful in the development of a system

The issue is that these models don't really reason and they trade in what might as well be a random way. For example, a stock might have just dropped 5%. One LLM might say that we should buy the stock now and follow a mean reversion strategy. Another may say we should short the stock and follow the trend. The same LLM may give the same output on a different call. A miniscule difference in price, time or other data will potentially change the output when really a signal should be relatively robust.

And if you're going to tell the model say, 'we want to look for mean reversion opportunities' - then why bother with an LLM?

Another angle: LLM's are trained on the vast swathe of scammy internet content and rubbish in relation to the stock market. 90%+ of active retail traders lose money. If an llm is fed on losing / scammy rubbish, how could it possibly produce a return?

replies(1): >>falcor+mZ

>>Saline+nl
they're language models. they exist to take in text and compare it to existing tokens.

they're not quant-bots that already exist to read in stock prices and make decisions. different kind of ML/AI

from TFA: "We also found that the models were highly sensitive to seemingly trivial prompt changes"

>>kqr+X3
LLM's know the WORDS of "The market can remain irrational longer than you can remain solvent", but not the meaning.

>>termin+It
The joy the player experiences in playing the game and becoming the best Pokemon Trainer :)

>>aswegs+e9
Computers managed to beat the world's best humans at chess long before they managed how learn how to walk. What's easy and hard for humans isn't necessarily a good guide to what's easy or hard for a computer.

(And I'm fairly sure it would be pretty easy to build a system that uses an LLM and a few other small components to beat Pokemon Red. The experiment you are talking about is deliberately hobbled by using a stock LLM without any such tools to make the whole thing entertaining. But when you are trading, you'd want to give your LLM as much help as possible.)

>>callam+uR
> If an llm is fed on losing / scammy rubbish, how could it possibly produce a return?

Rather than just relying on pretraining, you'd use RL on the trade outcomes.

replies(1): >>lambda+ei7

>>ritonl+0g
sorry dude. tried going down the rabbit hole but I'm too lazy and uninterested in it. read about it month ago or so. perhaps Daily News Sentiment Index uses LLMs, not sure. if you go long enough through https://quantocracy.com/ you should be able to find it

>>kqr+wA
Just noticed that it was after October 10th.

I'll still say that the trading period after October 10th has been brutally choppy. Only now do we have a clear direction (down) where you can at least short with some confidence

>>kqr+X3
>> That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital

LLM indeed can replace average human being.

replies(1): >>achier+xK2

>>kqr+X3
Well, if you can get a model to consistently lose money like that, then you just trade the opposite of what it says and you're guaranteed money!

replies(1): >>kqr+p22

>>Libidi+Az
> The idea you are going to beat Two Sigma at their own game with tokens is just an absurdity.

Individual quant traders aren't competing with Two Sigma. If you're an individual quant trader and you find a signal with $500k/yr capacity, that's awesome. If you're Two Sigma you won't give a single cahoot if it's not a $50M/yr signal. Two completely different ball games. I doubt Two Sigma is even trading on Hyperliquid either.

>>IAmGra+nt
Humans don't trade on future knowledge either.

Well, most of them - that can be illegal.

>>pants2+dO1
Thanks to the magic of compounding, inverting overbetting also leads to overbetting. Especially once costs are accounted for.

>>rzk+(OP)
This might be the dumbest thing I have ever seen but I am happy to be corrected and told why it’s not.

I use LLMs a lot and I work in finance and I don’t see how a LLM benefits in this space.

Also it looks like none of their data uses any kind of benchmarking. It’s purely a which model did better which I don’t think tells you much.

>>aswegs+e9
About that...

https://www.reddit.com/r/ClaudePlaysPokemon/comments/1otd4kl...

seems like the big issue is just spending time with the tooling to interact with Pokemon and just that calling an LLM for each button is time consuming.

>>baq+x7
I agree in sentiment but if you spent any amount of time in finance, even outside of equity markets, you would have a pretty quick mental model that LLMs are a weird fit for this space.

>>rzk+(OP)
> Ordering bias. Early prompts listed market data newest→oldest. Even with explicit notes, several models still read it as oldest → newest, inferring the wrong state. Switching to oldest → newest fixed the immediate error and suggests a formatting prior in current LLMs.

This kind of error just feels comical to me, and really makes it hard for me to believe that AGI is anywhere near. LLM's struggle to understand the order of datasets, when explicitly told. This is like showing a coin trick to a child, except perhaps even simpler.

replies(1): >>p1neco+XH2

>>thisis+mE
You wouldnt feed it numerical data, but you would allow it to make certain calculations (via tools of a harness) as it relates to your portfolio.

>>pinkmu+Cv2
I've noticed similar issues to this with rendering related code. Most models had a strong preference for z-up over z-down (I think, it might have been the other way around), and correcting them only fixed it for maybe the next response then the model would go back to using the wrong coordinates and getting confused by the outcome.

No amount of added context or instructions seems to fix these kind of issues in a way that doesn't still feel pretty hobbled. The only way to get the full power out of the model is to conform your problem to the expectations that seem to be baked in - i.e. just change your rendering coordinate system to be z-up.

>>ponect+OE1
The average human wipes out their retirements savings in one year?

replies(1): >>cauefc+P73

>>rzk+(OP)
When I saw this I rolled my eyes. It is well-understood that purpose-built models perform better than general models on tasks like this, and yet it would seem one of the main purposes of running this experiment according to the website is to figure out if general models are enough.

In addition, I cannot imagine how the selection of securities was chosen. Is XRP seriously part of the proposed asset mix here?

It's hard not to look at this and view it as a marketing stunt. Nothing about the results are surprising and the setup does not seem to make any sense to me to begin with.

>>achier+xK2
When they try to day trade, yes they do.

>>ta1265+jK
Why would it know anything better than a bunch of 12 yo given the same question ? LLM don't know things very well, they don't cross concepts in their mind. Give you an example, made $1500 yday trading nvidia:

I followed the curve for the last month, scalping a few times - I get a feel like panic point is ~180$, hype point ~195$, it's like that most swings. There were earnings yday, people are afraid that the company is over its head already and prefer to de-risk, which I do too sometimes on other stuff. It is true that nvidia is overpriced ofc, but I feel we have maybe a few good runs and that's where the risk, therefore the potential reward, is. I enter around 184, and a bit more around 182. I go to sleep (Im in China), and when I wake up I sell at 194. I got lucky, and I would not do it again before I understand why would nvidia be swinging again.

Is an LLM gonna be any better ? My brain did a classic Bayes analysis, used the recent past as strong signal to my prediction of the future (a completely absurd bias ofc, but all traders are absurd humans), I played a company that wasnt gonna burn me too much, since Im still happy to own shares of nvidia whatever the price, and the money put there was losable entirely without too much pain.

Do I need AI ? Meh. For your next play, do you trust me or chatgpt more ? I can explain my decisions very coherently, with good caveats on my limits and biases, and warnings about what risk to afford when. I experienced losses and gains, and I know the effect and causes of both, and how to deal with them both. I prefer me, to it.

replies(1): >>ta1265+bz6

>>Libidi+Az
>The idea you are going to beat Two Sigma at their own game with tokens is just an absurdity.

The idea isn't to beat them. It's to pick up the scraps. Same as every small trading operation.

I've seen the books of a guy who makes money hand over fist trading options. He'll be the first one to tell you what he does won't scale.

>>lordna+Te
These kinds of tests to me are not complete until they resolve the concept to full solution:

-Start just as they have here

-Keep improving the prompts in a huge variety of ways to see what improvements can be made

-start getting more and more code generated to complete more and more percentage of the work instead of textual prompting

-start fixing the worst parts with real human knowledge code/tools

-finally show fully working solution that does well, with full analysis of what kind of human intervention was necessary, and even explore what kind of prompting could lead to these human intuition-ed tooling going to whatever incredible lengths necessary to hand-hold the models in the right direction

otherwise... i don't get the points of stopping and saying "doesn't do great"

>>xwolfi+nD3
you apply it to find cross correlation ideas about larger numbers of assets. Try doing your stuff on a daily base with more than 500 assets :-)

replies(1): >>xwolfi+99e

>>falcor+mZ
RL would reasonably be expected to work if the market had some sort of discoverable static behavior.

The reason why RL by backtesting cannot work is that the real market is continuously changing, as all the agents within it, both human and automated, are constantly updating their opinions and strategies.

>>ta1265+bz6
But it won't give me anything interesting though ! Like, would you trust it on an even higher scale ? It has no basis for its investment thesis, it's a word statistician, not a risk-weighted decision taker !

>>rzk+(OP)
four days later 24/11 they are all in the negative with grok having lost nearly half of its starting sum

>>ritonl+0g
found one such index https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5763042 called Populism Index (POP) built from Wall Street Journal articles (not sure how publicly accessible it is)

>>vita77+eu
Exactly. If it can't distinguish between a basic repeat after me ordering how is it going to even get a simple output order correct? Let alone pull apart the strategies themselves

zlacker

Exploring the limits of large language models as quant traders