People really need to update their model of what a "statistical predictor" can a...

People really need to update their model of what a "statistical predictor" can accomplish. We know that Transformers are universal approximators of sequence-to-sequence functions[1], and so any structure that can be encoded into a sequence-to-sequence map can be modeled by Transformer layers. It follows that prediction and modeling are not categorically distinct capacities in LLMs, but exist on a continuum. How well the model predicts in a given instance is largely due to the availability of data during training. This is the basis for the beginnings of genuine understanding in LLMs. I talk about this in some length here[2]. Odd failures and hallucinations are just the model responding from different points along the prediction-modeling spectrum.

[1] https://arxiv.org/abs/1912.10077

[2] https://www.reddit.com/r/naturalism/comments/1236vzf/on_larg...