zlacker

Useful tip.

From a strategic standpoint of privacy, cost and control, I immediately went for local models, because that allowed to baseline tradeoffs and it also made it easier to understand where vendor lock-in could happen, or not get too narrow in perspective (e.g. llama.cpp/open router depending on local/cloud [1] ).

With the explosion of popularity of CLI tools (claude/continue/codex/kiro/etc) it still makes sense to be able to do the same, even if you can use several strategies to subsidize your cloud costs (being aware of the lack of privacy tradeoffs).

I would absolutely pitch that and evals as one small practice that will have compounding value for any "automation" you want to design in the future, because at some point you'll care about cost, risks, accuracy and regressions.

[1] - https://alexhans.github.io/posts/aider-with-open-router.html

[2] - https://www.reddit.com/r/LocalLLaMA

replies(3): >>mogoma+Z >>cyanyd+7b >>lancek+nE

>>alexha+(OP)
can you recommend a setup with ollama and a cli tool? Do you know if I need a licence for Claude if I only use my own local LLM?

replies(2): >>alexha+c5 >>drifki+ge

>>mogoma+Z
What are your needs/constraints (hardware constraints definitely a big one)?

The one I mentioned called continue.dev [1] is easy to try out and see if it meets your needs.

Hitting local models with it should be very easy (it calls APIs at a specific port)

[1] - https://github.com/continuedev/continue

replies(1): >>wongar+Bh

>>alexha+(OP)
I think control should be top of the list here. You're talking about building work flows, products and long term practices around something that's inherently non-deterministic.

And the probability that any given model you use today is the same as what you use tomorrow is doubly doubtful:

1. The model itself will change as they try to improve the cost-per-test improves. This will necessarily make your expectations non-deterministic.

2. The "harness" around that model will change as business-cost is tightened and the amount of context around the model is changed to improve the business case which generates the most money.

Then there's the "cataclysmic" lockout cost where you accidently use the wrong tool that gets you locked out of the entire ecosystem and you are black listed, like a gambler in vegas who figures out how to count cards and it works until the house's accountant identifies you as a non-negligible customer cost.

It's akin to anti-union arguments where everyone "buying" into the cloud AI circus thinks they're going to strike gold and completely ignores the fact that very few will and if they really wanted a better world and more control, they'd unionize and limit their illusions of grandeur. It should be an easy argument to make, but we're seeing about 1/3 of the population are extremely susceptible to greed based illusions.,

replies(1): >>alexha+SJ

>>mogoma+Z
we recently added a `launch` command to Ollama, so you can set up tools like Claude Code easily: https://ollama.com/blog/launch

tldr; `ollama launch claude`

glm-4.7-flash is a nice local model for this sort of thing if you have a machine that can run it

replies(1): >>vortic+Ff

>>drifki+ge
I have been using glm-4.7 a bunch today and it’s actually pretty good.

I set up a bot on 4claw and although it’s kinda slow, it took twenty minutes to load 3 subs and 5 posts from each then comment on interesting ones.

It actually managed to correctly use the api via curl though at one point it got a little stuck as it didn’t escape its json.

I’m going to run it for a few days but very impressed so for for such a small model.

>>alexha+c5
I've also made decent experiences with continue, at least for autocomplete. The UI wants you to set up an account, but you can just ignore that and configure ollama in the config file

For a full claude code replacement I'd go with opencode instead, but good models for that are something you run in your company's basement, not at home

>>alexha+(OP)
Can you say a bit more about evals and your approach?

>>cyanyd+7b
You're right. Control is the big one and both privacy and cost are only possible because you have control. It's a similar benefit to the one of Linux distros or open source software.

The rest of your points are why I mentioned AI evals and regressions. I share your sentiment. I've pitched it in the past as "We can’t compare what we can’t measure" and "Can I trust this to run on its own?" and how automation requires intent and understanding your risk profile. None of this is new for anyone who has designed software with sufficient impact in the past, of course.

Since you're interested in combating non-determinism, I wonder if you've reached the same conclusion of reducing the spaces where it can occur and compound making the "LLM" parts as minimal as possible between solid deterministic and well-tested building blocks (e.g. https://alexhans.github.io/posts/series/evals/error-compound... ).