Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People really need to update their model of what a "statistical predictor" can accomplish. We know that Transformers are universal approximators of sequence-to-sequence functions[1], and so any structure that can be encoded into a sequence-to-sequence map can be modeled by Transformer layers. It follows that prediction and modeling are not categorically distinct capacities in LLMs, but exist on a continuum. How well the model predicts in a given instance is largely due to the availability of data during training. This is the basis for the beginnings of genuine understanding in LLMs. I talk about this in some length here[2]. Odd failures and hallucinations are just the model responding from different points along the prediction-modeling spectrum.

[1] https://arxiv.org/abs/1912.10077

[2] https://www.reddit.com/r/naturalism/comments/1236vzf/on_larg...



There are also limits to the sequence to sequence things they can effectively learn, in some cases weaker than LSTMs:

https://arxiv.org/abs/2207.02098




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: