What you fail to appreciate is the operation of an LLM is driven by the input data far more than is the case with most programs. Typical programs have a lot of business logic that determines their behavior--rules, as you say. E.g., an optimizing compiler has a large number of hand-crafted optimizations that are invoked when code fits the pattern they are intended for. But LLMs don't have programmed cases or rule like that--the core algorithms are input-agnostic. All of the variability of the output is purely a reflection of patterns in the input; the programmers never made any sort of decision like "if this word is seen do this".