I have no idea about EBM, but I have researched a bit on the language modelling side. And let's be honest, GPT is not the best learner we can create right now (ourselves). GPT needs far more data and energy than a human, so clearly there is a better architecture somewhere waiting to be discovered.
Attention works, yes. But it is not naturally plausible at all. We don't do quadratic comparisons across a whole book or need to see thousands of samples to understand.
Personally I think that in the future recursive architectures and test time training will have a better chance long term than current full attention.
Also, I think that OpenAI biggest contribution is demostrating that reasoning like behaviors can emerge from really good language modelling.
Attention works, yes. But it is not naturally plausible at all. We don't do quadratic comparisons across a whole book or need to see thousands of samples to understand.
Personally I think that in the future recursive architectures and test time training will have a better chance long term than current full attention.
Also, I think that OpenAI biggest contribution is demostrating that reasoning like behaviors can emerge from really good language modelling.