Are fine-tuning datasets required to be input/output pairs? Or instead, can the fine-tuning be autoregressive (predict the next token throughout this corpus of unlabeled documents)?
For further reference you can lookup "next-token prediction objective".