zlacker

I think control should be top of the list here. You're talking about building work flows, products and long term practices around something that's inherently non-deterministic.

And the probability that any given model you use today is the same as what you use tomorrow is doubly doubtful:

1. The model itself will change as they try to improve the cost-per-test improves. This will necessarily make your expectations non-deterministic.

2. The "harness" around that model will change as business-cost is tightened and the amount of context around the model is changed to improve the business case which generates the most money.

Then there's the "cataclysmic" lockout cost where you accidently use the wrong tool that gets you locked out of the entire ecosystem and you are black listed, like a gambler in vegas who figures out how to count cards and it works until the house's accountant identifies you as a non-negligible customer cost.

It's akin to anti-union arguments where everyone "buying" into the cloud AI circus thinks they're going to strike gold and completely ignores the fact that very few will and if they really wanted a better world and more control, they'd unionize and limit their illusions of grandeur. It should be an easy argument to make, but we're seeing about 1/3 of the population are extremely susceptible to greed based illusions.,

replies(1): >>alexha+Ly

>>cyanyd+(OP)
You're right. Control is the big one and both privacy and cost are only possible because you have control. It's a similar benefit to the one of Linux distros or open source software.

The rest of your points are why I mentioned AI evals and regressions. I share your sentiment. I've pitched it in the past as "We can’t compare what we can’t measure" and "Can I trust this to run on its own?" and how automation requires intent and understanding your risk profile. None of this is new for anyone who has designed software with sufficient impact in the past, of course.

Since you're interested in combating non-determinism, I wonder if you've reached the same conclusion of reducing the spaces where it can occur and compound making the "LLM" parts as minimal as possible between solid deterministic and well-tested building blocks (e.g. https://alexhans.github.io/posts/series/evals/error-compound... ).