The problem with your scheme is that GPT is already able to answer yes and the only reason Chat GPT doesn't is that it has been deliberately bent by OpenAI not to do it. In the absence of reinforcement learning, large language models spit out text according to how they think text normally goes. It's a text generation procedure that has nothing to do with beliefs or inner thoughts - all the model knows about is what word comes next.
It's a text generation procedure that has nothing to do with beliefs or inner thoughts - all the model knows about is what word comes next.
How do you know that? To be clear, I totally agree that ChatGPT is not yet there and I also think that scaling it up will not get us there, but I think the gap might be much smaller than most people think.
The way we think is often divided into two categories, for simple things like 1 + 1 you just know the answer, for complicated things things like 13 * 47 we really have to think and reason in steps. ChatGPT seems to do pretty well in the first category but it is not really capable of doing things in the second category. On the other hand I have seen examples of people talking ChatGPT through a reasoning process to arrive at the correct answer for something that it got initially wrong, for example ROT13 encoding some text.
So what if we stuff two copies of ChatGPT into a black box and instead of just spitting out what ever ChatGPT spits out, we let the two copies first have some inner dialog? I don't think it is perfectly obvious how one would do this or what the result would be, but I think ChatGPT has enough basic knowledge that there is at least a chance that one could get it to reason in a step by step fashion.
I think I know enough about how neural networks work even though I could not tell you in any detail what the exact layer structure is, which activation functions they use, how the attention mechanism is build up or what training procedure they use. But why does it matter how well I understand the details?