If you're just using their completions/chat API, you're gonna be ok. As an ultimate fallback you can spin up H100s in the cloud and run VLLM atop a high param open model like Llama 70B. Such models will catch up and their param counts will increase.. eventually. But initially expect gpt-3.5-esque performance. VLLM will give you an OpenAI-like REST API atop a range of models. Keep making things :))