For about 1000 input tokens (and resulting 1000 output tokens), to my surprise, GPT-3.5 turbo was 100x cheaper than Llama 2.
Llama 7B wasn't up to the task fyi, producing very poor translations.
I believe that OpenAI priced GPT-3.5 aggressively cheap in order to make it a non-brainer to rely on them rather than relying on other vendors (even open source models).
I'm curious to see if others have gotten different results?
I built two such a systems after burning that much in a week on ChatGPT.
* Chenbro Rackmount 4U Server Chassis RM42300-F (rack mount case Remove the air filter on 120mm fan. Put two decent 80mm exhaust at rear). * Two used air cooled 3090s. About $650 a piece on ebay. Check slot width and make sure everything will fit on your motherboard. Do a burn in when you get them cause used GPUs can be hit or miss. * 5950x CPU (overkill just had it) * 128GB DDR4 * Motherboard with x570 chipset and dual pcie x16. These will birificate to x8 pcie 4.0 lanes to each GPU. This is enough bandwidth to push GPUs to max IME * 1200W+ ATX power supply. * ebay "u.2 pcie 3.84TB" and adaptor for m.2 NVME slot. (again what I had & it is cheap)
If you're going to really beat the thing I would power limit the 3090s to 320w (from 350w). Perf change is not really notable and keeps temps better.