zlacker

The knowledge probably is o the pre-training data (the internet documenta the LLM is trained at to get a good grasp), but probably very poorly represented in the reinforcement learning phase.

Which is to say that probably antropic don’t have good training documents and evals to teach the model how to do that.

Well they didn’t. But now they have some.

If the author want to improve his efficiency even more, I’d suggest he starts creating tools that allow a human to create a text trace of a good run on decompilating this project.

Those traces can be hosted in a place Antropic can see and then after the next model pre-training there will be a good chance the model become even better at this task.