And yet it is often able to make surprising references to previous text. This is...

skybrian · on Dec 15, 2022

The attention mechanism lets it look backwards to "understand" what was said before and predict what could possibly come next. Whatever consistency it has is due to studying the preceding text.

Thinking ahead is different. All it needs to do is calculate the probability that there is any reasonable completion starting with a particular word. It doesn't need to decide what it's going to say beyond that; it can decide later.

Have you ever played a game where players take turns adding one more word to a sentence? When it's your turn and you're choosing the next word, you don't need to think ahead very much. Also, you don't necessarily need have the same thing in mind as the player who went before you.

In improv there is a "yes, and" where you are always building on what happened before. These algorithms are doing improv all the time.

The algorithm doesn't know or care who wrote the words that came before. It will find a continuation regardless.