Unless you can run the LLM locally, on a computer you own, you are now completely dependent on a remote centralized system to do your work. Whoever controls that system can arbitrarily raise the prices, subtly manipulate the outputs, store and do anything they want with the inputs, or even suddenly cease to operate. And since, according to this article, only the latest and greatest LLM is acceptable (and I've seen that exact same argument six months ago), running locally is not viable (I've seen, in a recent discussion, someone mention a home server with something like 384G of RAM just to run one LLM locally).
To those of us who like Free Software because of the freedom it gives us, this is a severe regression.
Sure, but that is not the point of the article. LLMs are useful. The fact that you are dependent on someone else is a different problem like being dependent on microsoft for your office suite.
It's fair to be worried about depending on LLM. But I find the dependance on things like AWS or Azure more problematic, if we are talking about centralized and proprietary
Self-hosting has always have a lot of drawbacks compared with commercial solutions. I bet my self-host file server has worse reliability than Google Drive, or my self-host git server has worse number of concurrent user than github.
It's one thing you must accept when self-host.
So when you self-host LLM, you must either accept a drop in output quality, or spend a small fortune on hardware
Raspberry pi was a huge step forward, the move to LLMs is two steps back.
* Not even counting cellular data carriers, I have a choice of at least five ISPs in my area. And if things get really bad, I can go down to my local library to politely encamp myself and use their WiFi.
* I've personally no need for a cloud provider, but I've spent a lot of time working on cloud-agnostic stuff. All the major cloud providers (and many of the minors) provide compute, storage (whether block, object, or relational), and network ingress and egress. As long as you don't deliberately tie yourself to the vendor-specific stuff, you're free to choose among all available providers.
* I run Linux. Enough said.
Maven central is gone and you have no proxy setup or your local cache is busted? Poof, you’re fucking gone, all your Springs, Daggers, Quarkuses and every third party crap that makes up your program is gone. Same applies to bazillion JS, Rust libraries.
A guy says here you need 4TB for a PyPi mirror, 285 GB for npm
https://stackoverflow.com/questions/65995150/is-it-possible-...
We're not yet to that same point for performance of local LLM models afaict, though I do enjoy messing around them.
In 20 years, memory has doubled 32x
It means that we could have 16 TB memory computers in 2045.
It can unlock a lot of possibilities. If even 1 TB is not enough by then (better architecture, more compact representation of data, etc).
See the Microsoft ecosystem as an example. Nothing they do could not be replicated, but the network effects they achieved are strong. Too much glue, and 3rd party systems, and also training, and what users are used to, and what workers you could hire are used to, now all point to the MS ecosystem.
In this early mass-AI-use phase you still can easily switch vendors, sure. Just like in the 1980s you could still choose some other OS or office suite (like Star Office - the basis for OpenOffice, Lotus, WordStar, WordPerfect) without paying that kind of ecosystem cost, because it did not exist yet.
Today too much infrastructure and software relies on the systems from one particular company to change easily, even if the competition were able to provide a better piece of software in one area.
Still, I suppose that's better than what nvidia has on offer atm (even if a rack of gpus gives you much, much higher memory throughput).
In some cases it's more cost effective to get M-series Mac Minis vs nVidia GPUs
There are all kinds of trades that the car person and the non-car person makes for better or worse depending on the circumstance. The non-car person may miss out on a hobby, or not know why road trips are neat, but they don't have the massive physical and financial liabilities that come with them. The car person meanwhile—in addition to the aforementioned issues—might forget how to grocery shop in smaller quantities, or engage with people out in the world because they just go from point A to B in their private vessel, but they may theoretically engage in more distant varied activities that the non-car person would have to plan for further in advance.
Taking the analogy a step further, each party gradually sets different standards for themselves that push the two archetypes into diametrically opposed positions. The non-car owner's life doesn't just not depend on cars, but is often actively made worse by their presence. For the car person, the presence of people, especially those who don't use a car, gradually becomes over-stimulating; cyclists feel like an imposition, people walking around could attack at any moment, even other cars become the enemy. I once knew someone who'd spent his whole life commuting by car, and when he took a new job downtown, had to confront the reality that not only had he never taken the train, he'd become afraid of taking it.
In this sense, the rise of LLM does remind of the rise of frontend frameworks, bootcamps thay started with React or React Native, high level languages, and even things like having great internet; the only people who ask what happens in a less ideal case are the ones who've either dealt with those constraints first-hand, or have tried to simulate it. If you've never been to the countryside, or a forest, or a hotel, you might never consider how your product responds in a poor connectivity environment, and these are the people who wind up getting lost on basic hiking trails having assumed that their online map would produce relevant information and always be there.
Edit: To clarify, in the analogy, it's clear that cars are not intrinsically bad tools or worthwhile inventions, but had excitement for them been tempered during their rise in commodification and popularity, the feedback loops that ended up all but forcing people to use them in certain regions could have been broken more easily.
* Hmm, what kind of software do you write that pays your bills?
* And your setup doesn't require any external infrastructure to be kept up to date?
The point being made here is that a developer that can only do their primary job of coding via a hosted LLM is entirely dependent on a third party.
200-300$/month are already 7k in 3 years.
And I do expect some hardware chip based models in a few years like a GPU.
AiPU we're you can replace the hardware ai chip.
Open source of course.
So what's my response to that deprecating? Maintaining it myself? Nope finding another library.
You always depend on something...
You say that like it's an absurd idea, but in fact this is what most companies would do.
True, but I think wanting to avoid yet another dependency is a good thing.
And with $10k I could pay 40 years of Claude subscription. A much smarter and faster model.
You make a good point of course that independence is important. But primo, this ship sailed long ago, secundo, more than one party provides the service you depend on. If one failes you still have at least some alternatives.
And I have worked in plenty of companies I'm the open source guy in these companies and me or my teams never had the capacity to do so
That said I only find google results somewhat helpful. Its a lot like LLM code (not surprising given how they're trained), I may find 5 answers online and one or two has a small piece of what I need. Ultimately that may say me a bit of time or give me an idea for something I hadn't thought of, but it isn't core to my daily work by any stretch.
Spoken from a fair bit of experience doing software development in closed rooms with strict control of all digital devices (from your phone to your watch) and absolutely no external connections.
There are moments that are painful still, because you'll be trying to find a thing in a manual and you know a search can get it faster - but it's silly to imply this isn't possible.
And it's not like people weren't able to develop complicated software before the internet. They just had big documentation books that cost money and could get dated quickly. To be clear, having that same info a quick google search away is an improvement, and I'm not going to stop using google while it's available to me. But that doesn't mean we'd all be screwed if google stopped existing tomorrow.
> 200-300$/month are already 7k in 3 years.
Except at current crazy rates of improvement, cloud based models will in reality likely be ~50x better, and you'll still have the same system.
And it feels strange, because I am constantly asking people what books they're reading.
I agree we will see how this plays out but I hope models might start to become more efficient and it might not matter that much for certain things to run some parts locally.
I could imagine a LLM model with a lot less languages and optimized for one programming language to happen. Like 'generaten your model'
2.5 years ago it could just about run LLaMA 1, and that model sucked.
Today it can run Mistral Small 3.1, Gemma 3 27B, Llama 3.3 70B - same exact hardware, but those models are competitive with the best available cloud-hosted model from two years ago (GPT-4).
The best hosted models (o3, Claude 4, Gemini 2.5 etc) are still way better than the best models I can run on my 3-year-old laptop, but the rate of improvements for those local models (on the same system) has been truly incredible.
This sounds a bit like bailing out the ocean.
My company has set this up for one of our customers (I wasn't involved).
In fact, MCP is so ground breaking that I consider it to be the actual meat and potatoes of coding AIs. Large models are too monolithic, and knowledge is forever changing. Better just to use a small 14b model (or even 8b in some cases!) with some MCP search tools, a good knowledge graph for memory, and a decent front end for everything. Let it teach itself based on the current context.
And all of that can run on an off the shelf $1k gaming computer from Costco. It’ll be super slow compared to a cloud system (like HDD vs SSD levels of slowness), but it will run in the first place and you’ll get *something* out of it.
I mostly write JS today and it either runs in browsers (dependencies) or a host like AAwS (dependencies). I use VS Codium and a handful of plugins (dependencies).
These all help me work efficiently when I'm coding, or help me avoid infrastructure issues that I don't want to deal with. Any one part is replaceable though, and more importantly any one part isn't responsible for doing my entire job of creating and shipping code.
Folks that are local LLMs everyday now will probably say you can basically emulate at least Sonnet 3.7 for coding if you have an real AI workstation. Which may be true, but the time and effort and cost involved is substantial.
If it's one individual doing this, sure. I am posting this in the hopes that others follow suit.
To be fair, the entire internet is basically this already.
FOSS is more about:
1. Finding some software you can use for your problem
2. Have an issue for your particular use case
3. Download the code and fix the issue.
4. Cleanup the patch and send a proposal to the maintainer. PR is easy, but email is ok. You can even use a pastebin service and post it on a forum (suckless does that in part).
5. The maintainer merges the patch and you can revert to the official version, or they don't and you decides to go with your fork.
Therefore using your own bare metal is a low of expensive redundancy.
For the cloud provider they can utilise the GPU to make it pay. They can also subsidise it with VC money :)
For the past few years, we've been "getting smaller" by getting deeper. The diameter of the cell shrinks, but the depth of the cell goes up. As you can imagine, that doesn't scale very well. Cutting the cylinder diameter in half doubles the depth of the cylinder for the same volume.
If you try to put the cells closer together, you start to get quantum tunneling where electrons would disappear from one cell and appear in another cell altering charges in unexpected ways.
The times of massive memory shrinks are over. That means we have to reduce production costs and have more chips per computer or find a new kind of memory that is mass producible.
This actually to me implies the opposite of what you’re saying here. Why bother relearning the state of the art every few months, versus waiting for things to stabilize on a set of easy-to-use tools?
I'm pretty sure the connotation of "self-host" entails a strictly substantially smaller scope than starting your own ISP.
Finding someone willing to peer with you also defeats the purpose. You are still fundamentally dependent on established ISPs.
It's not off-grid, but that's the eventual dream/ goal.
And many do. The US isn't the entire world, you know.
> ...what kind of software do you write that pays your bills?
B2B software that allows anyone to run their workloads with most any cloud provider, and most any on-prem "cloud". The entire point of this software is to abstract out the underlying infrastructure so that businesses can walk away from a particular vendor if that vendor gets too stroppy.
> ...your setup doesn't require any external infrastructure...
It's Gentoo Linux, so it runs largely on donated infra (and infra paid for with donations). But -unlike Windows or OS X users- if I get sick of what the Gentoo steering committee are doing, I can go to another distro (or just fucking roll my own should things get truly dire). That's the point of my comment.
So I guess broadly speaking there could be strategies involving attempting to influence governmental policy rather than by consumer choice.
Or more radically, trying to change the structure of the government in general such that the above influences actually are more tractable for the common person.