No. LLMs can take any type of data. Text is simply a string of symbols. Images, ...

No. LLMs can take any type of data. Text is simply a string of symbols. Images, video and music are also a string of symbols. The model is the same algorithm just trained on different types of data.

I never said cognition was limited to text. I just limited the topic itself to cognition involving text.