I speculate what is going on is that the agent's context retrieval algorithm is bad, so it does not give the LLM the right context, because today's models should suffice to get the job done.
Does anyone know which model in particular was used in these PRs? They support a variety of models: https://github.blog/ai-and-ml/github-copilot/which-ai-model-...