You either need a backend with good batching support (vLLM), or if you don't need much throughput, an extremely low end GPU or no GPU at all for exLlama/llama.cpp.
OpenAI benefits from quantization/batching, optimized kernels and very high utilization on their end, so the huge price gap vs a default HF Transformers instance is understandable. But even then, you are probably right about their aggressive pricing.
As for quality, you need a llama model finetunes on the target language (many already exist on Huggingface) and possibly custom grammar if your backend supports it.