I have no idea about EBM, but I have researched a bit on the language modelling ...

I have no idea about EBM, but I have researched a bit on the language modelling side. And let's be honest, GPT is not the best learner we can create right now (ourselves). GPT needs far more data and energy than a human, so clearly there is a better architecture somewhere waiting to be discovered.

Attention works, yes. But it is not naturally plausible at all. We don't do quadratic comparisons across a whole book or need to see thousands of samples to understand.

Personally I think that in the future recursive architectures and test time training will have a better chance long term than current full attention.

Also, I think that OpenAI biggest contribution is demostrating that reasoning like behaviors can emerge from really good language modelling.