But for each nine of reliability you want out of llms everyone's assuming it's just a linear growth. I don't think it is. I think it's polynomial at least.
As for your tasks and maybe it's just cuz I'm using chat GPT, but I asked it to Port sed, something with full open source code availability, tons of examples/test cases, a fully documented user interface and I wanted it moved to Java as a library.
And it failed pretty spectacularly. Yeah it got the very very very basic functionality of sed.