zlacker

Depending on what you're trying to accomplish, I'd highly recommend trying the 7B and 13B models first before jumping to the 70B. They're quite capable and I think lots of folks assume they need to jump to a 70B model when really a smaller one would work fine.

That said, you should be able to fine-tune a 70B model on an A100 using QLoRA. However, depending on the specifics of your dataset it might actually be cheaper to run on an 8xA100 machine since that way you don't have to swap any weights out to the machine's non-GPU memory, and you might get enough time savings from that that the more expensive machine pays for itself.

replies(1): >>indeye+c5

>>kcorbi+(OP)
The plan was to do it in-house. And buying 8xA100 is a bit too much ;)

replies(1): >>FrostK+z42

>>indeye+c5
I'm in the exactly same boat. Targeting to fine tune llama 2 70b on 2xA100, with the hope of having one A100 run an 8bit quantized 70b model 24/7.

If you have an experiences to share, successes or failures, please do.