The article links multiple papers on chain of thought reasoning. There are tasks that language models struggle with, but when you ask it to explain its reasoning, certain large language models do much better than the scaling for the normal prompt would suggest. Calling this an 'ascribed quality' is crazy, it's just an observation and says nothing about the internals. Hell, you could even test it yourself if you don't trust the papers.
Saying that it just looks like P(text|internet) is a tautology, it's a text predictor trained on the internet. This doesn't tell you anything about why phenomena like the above occur, or why it occurs only in large language models and only in some of them.
Saying that it just looks like P(text|internet) is a tautology, it's a text predictor trained on the internet. This doesn't tell you anything about why phenomena like the above occur, or why it occurs only in large language models and only in some of them.