zlacker

I think if you use a very loose definition of learning: A stimuli which alters subsequent behavior you can claim this is learning. But if you tell a human to replace the word “is” with “are” in the next two sentences, this could hardly be considered learning, rather it is just following commands, even though it meets the previous loose definition. This is why in psychology we usually include some timescale for how long the altered behavior must last for it to be considered learning. A short-term altered behavior is usually called priming. But even then I wouldn’t even consider “following commands” to be neither priming nor learning, I would simply call it obeying.

If an LLM learned something when you gave it commands, it would probably be reflected in some adjusted weights in some of its operational matrix. This is true of human learning, we strengthen some neural connection, and when we receive a similar stimuli in a similar situation sometime in the future, the new stimuli will follow a slightly different path along its neural pathway and result in a altered behavior (or at least have a greater probability of an altered behavior). For an LLM to “learn” I would like to see something similar.

replies(1): >>rerdav+MX1

>>runarb+(OP)
I think you have an overly strict definition of what "learning" means. ChatGPT now has memory that lasts beyond the lifetime of it's context buffer, and now has at least medium term memory. (Actually I'm not entirely sure that they are not just using long persistent context buffers, but anyway).

Admittedly, you have to wrap LLMs to with stuff to get them to do that. If you want to rewrite the rules to excluded that then I will have to revise my statement that it is "mostly, but not completely true".

:-P

replies(1): >>runarb+Mo2

>>rerdav+MX1
You also have to alter some neural pathways in your brain to follow commands. That doesn’t make it learning. Learned behavior is usually (but not always) reflected in long term changes to neural pathways outside of the language centers of the brain, and outside of the short-term memory. Ones you forget the command, and still apply the behavior, that is learning.

I think SSR schedulers are a good example of a Machine Learning algorithms that learns from it’s previous interactions. If you run the optimizer you will end up with a different weight matrix, and flashcards will be schedule differently. It has learned how well you retain these cards. But an LLM that is simply following orders has not learned anything, unless you feed the previous interaction back into the system to alter future outcomes, regardless of whether it “remembers” the original interactions. With the SSR, your review history is completely forgotten about. You could delete it, but the weight matrix keeps the optimized weights. If you delete your chat history with ChatGPT, it will not behave any differently based on the previous interaction.