Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They work in a token space whose metrical structure is given by proxies for concepts. So at a point in this space I can "walk towards" points which cluster around the token "dog".

This is a weak model of some features of concepts, eg., association: "dog" is associated with "cat", etc. But it, e.g., does not model composition, nor intension, nor the role of the term in counterfactuals. (See my comment elsewhere in this comments section on this issue).

However you can always brute force your way to apparent performance in some apparently conceptual skill if the kinds of questions you ask are similar to the trainign data. So eg., if someone has asked, "if dogs played on mars, would they be happy?" etc. or similar-enough-families-of-questions... then that allows you to have a "dog" cluster around "literal facts" and a "dog" cluster around some subset of preknown counterfactuals.

If you want to see the difference between this and genuine mental capabilities, note that there are an infinite combination of concepts of abitary depth, which can be framed in an infinite number of counterfactauls, and so on. And a child armed with only those basic components, and the capacity for imagination, can evaluate this infinite variety.

This is why we see LLMs being used most by narrow fields (esp. software engineers) where the kinds of "conceptual work" that they need has been extremely well documented and is sufficiently stable to provide some utiltiy.



As always with definitive assertions regarding LLMs incapacities, i would be more convinced if one could demonstrate those assertions with an illustrative example, on a real LLM.

So far, the abilities of LLM to manipulate concepts, in practice, has been indistinguishable in practice from "true" human-level concept manipulation. And not just for scientific, "narrow" fields.


The problem with mental capacities, is that they are not measured by tests. We have no valid and reliable way of determining them. Hence why "metrics psychology" is a pseudoscience.

If I give a child a physics exam, and they score 100% it could either be because they're genuinely a genius (possessing all relevant capabilities and knowledge), or because they cheated. Suppose we dont know how they're cheating, but they are. Now, how would you find out? Certainly not by feeding more physics exams, at least, its easy enough to suppose they can cheat on those.

The issue here is that the LLM has compressed basically everything written in human history, and the question before us is "to what degree is a 'complex search' operation expressing a genuine capability, vs. cheating?"

And there is no general methodological answer to that question. I cannot give you a "test", not least because I'm required to give you it in token-in--token-out form (ie., written) and this dramatically narrows the scope of capability testing methods.

Eg., I could ask the cheating child to come to a physics lab and perform an experiment -- but I can ask no such thing from an LLM. One thing we could do with an LLM is have a physics-ignorant-person act as an intermediary with the LLM, and see if they, with the LLM, can find the charge on the electron in a physics lab. That's highly likely to fail with current LLMs, in my view -- because much of the illusion of their capability lies in the expertise of the prompter.

> has been indistinguishable in practice from "true" human-level concept manipulation

This claim indicates you're begging the question. We do not use the written output of animal's mental capabilities to establish their existence -- that would be a gross pseudoscience; so to say that LLMs are indistinguishable from anything relevant indicates you're not aware of what the claim of "human-level concept manipulation" even amounts to. It has nothing to do with emitting tokens.

When designing a test to see if an animal possesses a relevant concept, can apply it to a relevant situation, can compose it with other concepts, and so on -- we would never look to linguistic competence, which even in humans, is an unreliable proxy: hence the need for decades of education and the high fallibility of exams.

Rather if I were assessing "does this person understanding 'Dog'?" I would be looking for contextual competence in application of the concept in a very broad role in reasoning processes: identification in the environment, counterfactual reasoning, composition with other known concepts in complex reasoning processes, and the like.

All LLMs do is emit text as-if they have these capacities, which makes a general solution to exposing their lack of them, basically methodologically impossible. Training LLMs is an anti-inductive process: the more tests we provide, the more they are trained on them, so the tests become useless.

Consider the following challenge: there are two glass panels, one is a window; and the other is a very high def TV showing a video game simulation of the world outside the window. You are fixed at a distance of 20 meters from the TV, and can only test each glass pane by taking a photograph of it, and studying the photograph. Can you tell which window is the outside? In general, no.

This is the grossly pseudoscientific experimental restriction people who hype LLMs impose: the only tests are tokens-in, tokens-out -- "photographs taken at a distance". If you were about to be throw against one of these glass panels, which would you choose?

If an LLM was, based on token in/out analysis alone, put in charge of a power plant: would you live near by?

It matters if these capabilities exist, because if real, the system will behave as expected according to capabilities. If its cheating, when you're thrown against the wrong window, you fall out.

LLMs are in practice, incredibly fragile systems, whose apparent capabilities quickly disappear when the kinds of apparent reasoning they need to engage in are poorly represetned in their training data.

Consider one way of measuring the capability to imagine that isnt token/token: energy use and time-to-compute:

Here, we can say for certain that LLMs do not engaged in counterfactual reasoning. Eg., we can give a series of prompts (p1, p2, p3...) which require increasing complexity of the imagined scenario, eg., exponentially more diverse stipulations, and we do not find O(answering) to follow O(p-complexity-increase). Rather the search strategy is always the same for single-shot prompt: so no trace thru an LLM involves simulation. We can just get "mildly above linear" (apparent) reasoning complexity with chain-of-thought, but this likewise does not follow the target O().

The kinds of time-to-compute we observe from LLM systems are entirely consistent with a "search and synthesis" over token-space algorithm, that only appears to simulate if the search space contains prior exemplars of simulation. There is no genuine capability


"…we would never look to linguistic competence".

On the contrary, i strongly believe that what LLM proved is the fact linguists have always told us about : that the language provides a structure on top of which we're building our experience of concepts (Sapir whorf hypothesis).

I don't think one can conceptualize much without the use of a language.


> I don't think one can conceptualize much without the use of a language.

Well a great swath of the animal kingdom stands against you.

LLMs have invited yet more of this pseudoscience. It's a nonesense position in an empirical study of mental capabilites across the animal kingdom. Something previuosly only believed by idealist philosophers of the early 20th century and prior. Now brought back so people can maintain their image in the face of their apparent self-deception: better we opt for gross pseudoscience than admit we're fooled by a text generation machine.


I would agree with this if the LLM never really modified the initial linear embeddings, but non-linearity in MLP layers and position/correlation fixing in the attention layers would mean that things are not so simple. I’m pretty sure there are papers showing compositionality and so on being represented by transformers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: