1. ChatGPT knows the algorithm for adding two numbers of arbitrary magnitude.
2. It often fails to use the algorithm in point 1 and hallucinates the result.
Knowing something doesn't mean it will get it right all the time. Rather, an LLM is almost guaranteed to mess up some of the time due to the probabilistic nature of its sampling. But this alone doesn't prove that it only brute-forced task X.