Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that’s provably incorrect for the current approach to LLMs. They all have a horizon over which they correlate tokens in the input stream.

So, for any LLM, if you intersperse more than that number of ‘X’ tokens between each useful token, they won’t be able to do anything resembling intelligence.

The current LLMs are a bit like n-gram databases that do not use letters, but larger units.



I think that's an oversimplification. LLMs have a limited context window of tokens. But that isn't necessarily a limitation: it's been proven that a LLM can simulate any algorithm even with a limited context window (https://arxiv.org/abs/2410.03170), making it computationally universal.

Even if that weren't true, the context windows can be quite large and will get bigger as people figure out how to optimize LLMs. For example Gemini 1.5 has a context of 2 million tokens. A book is typically around 120,000 words, so that's almost 20 books. So one could argue with a context this big they could construct reasoning chains involving far more disparate pieces of information than humans typically work with simultaneously, and arguably that demonstrates intelligence as well.


It’s that a bit of an unfair sabotage?

Naturally, humans couldn’t do it, even though they could edit the input to remove the X’s, but shouldn’t we evaluate the ability (even intelligent ability) of LLM’s on what they can generally do rather than amplify their weakness?


Why is that unfair in reply to the claim “At this stage I assume everything having a sequencial pattern can and will be automated by LLM AIs.”?

I am not claiming LLMs aren’t or cannot be intelligent, not even that they cannot do magical things; I just rebuked a statement about the lack of limits of LLMs.

> Naturally, humans couldn’t do it, even though they could edit the input to remove the X’s

So, what are you claiming: that they cannot or that they can? I think most people can and many would. Confronted with a file containing millions of X’s, many humans will wonder whether there’s something else than X’s in the file, do a ‘replace all’, discover the question hidden in that sea of X’s, and answer it.

There even are simple files where most humans would easily spot things without having to think of removing those X's. Consider a file

   How         X X X X X X
   many        X X X X X X
   days        X X X X X X
   are         X X X X X X
   there       X X X X X X
   in          X X X X X X
   a           X X X X X X
   week?       X X X X X X
with a million X’s on the end of each line. Spotting the question in that is easy for humans, but impossible for the current bunch of LLMs


This is only easy because the software does line wrapping for you, mechanistically transforming the hard pattern of millions of symbols into another that happens to be easy for your visual system to match. Do the same for any visually capable model and it will get that easily too. Conversely, make that a single line (like the one transformers sees) and you will struggle much more than the transformer because you'll have to scan millions of symbols sequentially looking for patterns.

Humans have weak attention compared to it, this is a poor example.


If you have a million Xs on the end of each line, when a human is looking at that file, he's not looking at the entirety of it, but only at the part that is actually visible on-screen, so the equivalent task for an LLM would be to feed it the same subset as input. In which case they can all answer this question just fine.


I would also add that if I saw a file full of a million X's, I would not track each X as a distinct item in my mind, I would simplify the visual input down to "this is a file containing a lot of Xs" and work with that lightweight abstraction instead.


> If you have a million Xs on the end of each line

Hmm, I wonder if adding a compression layer during encoding helps?


The follow-up question is "Does it require a paradigm shift to solve it?". And the answer could be "No". Episodic memory, hierarchical learnable tokenization, online learning or whatever works well on GPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: