Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.
It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.
But you're not training LLMs as you use them really - do you mean that it's best to develop your own skill using LLMs in an area you already understand well?
I'm finding it a bit hard to square your comment about it being exhausting to catherd the LLM with it being a force multiplier.
You just explained how your work was affected by a big multiplier. At the end of training an intern you get a trained intern -- potentially a huge multiplier. ChatGPT is like an intern you can never train and will never get much better.
These are the same people who would no longer create or participate deeply in OSS (+100x multipler) bragging about the +2x multiplier they got in exchange.
If you know what you're doing you can still "teach" them though, but it's on you to do that - you need to keep on iterating on things like the system prompt you are using and the context you feed in to the model.
Humans really like to anthropomorphize things. Loud rumbles in the clouds? There must be a dude on top of a mountain somewhere who's in charge of it. Impressed by that tree? It must have a spirit that's like our spirits.
I think a lot of the reason LLMs are enjoying such a huge hype wave is that they invite that sort of anthropomorphization. It can be really hard to think about them in terms of what they actually are, because both our head-meat and our culture has so much support for casting things as other people.
Edit: grammar
With LLMs the better I get at the scaffolding and prompting, the less it feels like catherding (so far at least). Hence the comparison.
Makes me wonder if there had been equal investment into specialized tools which used more fine-tuned statistical methods (like supervised learning), that we would have something much better then LLMs.
I keep thinking about spell checkers and auto-translators, which have been using machine learning for a while, with pretty impressive results (unless I’m mistaken I think most of those use supervised learning models). I have no doubt we will start seeing companies replacing these proven models with an LLM and a noticeable reduction in quality.
If an LLM learned something when you gave it commands, it would probably be reflected in some adjusted weights in some of its operational matrix. This is true of human learning, we strengthen some neural connection, and when we receive a similar stimuli in a similar situation sometime in the future, the new stimuli will follow a slightly different path along its neural pathway and result in a altered behavior (or at least have a greater probability of an altered behavior). For an LLM to “learn” I would like to see something similar.
I wrote a bit about that here - I've turned it off: https://simonwillison.net/2025/May/21/chatgpt-new-memory/
Admittedly, you have to wrap LLMs to with stuff to get them to do that. If you want to rewrite the rules to excluded that then I will have to revise my statement that it is "mostly, but not completely true".
:-P
I think SSR schedulers are a good example of a Machine Learning algorithms that learns from it’s previous interactions. If you run the optimizer you will end up with a different weight matrix, and flashcards will be schedule differently. It has learned how well you retain these cards. But an LLM that is simply following orders has not learned anything, unless you feed the previous interaction back into the system to alter future outcomes, regardless of whether it “remembers” the original interactions. With the SSR, your review history is completely forgotten about. You could delete it, but the weight matrix keeps the optimized weights. If you delete your chat history with ChatGPT, it will not behave any differently based on the previous interaction.