That said, you should be able to fine-tune a 70B model on an A100 using QLoRA. However, depending on the specifics of your dataset it might actually be cheaper to run on an 8xA100 machine since that way you don't have to swap any weights out to the machine's non-GPU memory, and you might get enough time savings from that that the more expensive machine pays for itself.
If you have an experiences to share, successes or failures, please do.