You might be right, a LLM alone doesn't improve by itself. But when it is part of a system like GPT's, then it can use web search, local RAG, code execution and also get human guidance and corrections. Clearly superior setup that improves over the LLM alone. I believe that is why OpenAI created GPT's, to lift a model at level N to level N+1.