They most definitely don't. We attach symbolic meaning to their output because we can map it semantically to the input we gave it. Which is why people are often caught by surprise when these mappings break down.
LLMs can emulate reasoning, but the failure modes show that they don't. We can get them to be coincidentally emulating reasoning well enough long enough to fools us, investors and the media. But doubling down on it hoping that this problem goes away with scale or fine tuning is proving more and more reckless.