I don't think we can just assume that training with this exact form of questioni...

I don't think we can just assume that training with this exact form of questioning will lead to a strong performance on such questions. For one thing, given the LLM propensity for hallucinating, I do not think we can be confident that an LLM, after this training, will reliably employ the correct model to answer a given question.