Even the best unquantized finetunes of llama2-70b are, at best, somewhat superior to GPT-3.5-turbo (and I'm not even sure they would beat the original GPT-3.5, which was smarter). They are not even close to GPT-4 on any task requiring serious reasoning or instruction following.