For an individual use case Llama is fine. If you start getting to large workflows and need reliable outputs, GPT wins out substantially. I know all the papers and headlines about comparative performance, but thats on benchmarks.
Ive found that benchmarks are great as a hygiene test, but pointless when you need to get work done.