Rather than just relying on pretraining, you'd use RL on the trade outcomes.
The reason why RL by backtesting cannot work is that the real market is continuously changing, as all the agents within it, both human and automated, are constantly updating their opinions and strategies.