I don't really see how this is different from "LLMs can't multiply 20 digit numbers"--which btw, most humans can't either. I tried it once (using pen and paper) and consistently made errors somewhere.
Yeah, and FWIW doing this through writing code is trivial in an LLM / LRM - after testing locally took not even a minute to have a working solution no matter the amount of disks.
Your analogy makes sense, no reasonable person would try to solve a Tower of Hanoi type problem with e.g. 15 disks and sit there for 32,767 moves non-programmatically.