Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Amazing insight, particularly section 6.

"- The two important but different abilities of GPT-3.5 are *knowledge* and *reasoning*. Generally, it would be ideal if we could *offload the knowledge part to the outside retrieval system and let the language model only focus on reasoning.* This is because: - The model’s internal knowledge is always cut off at a certain time. The model always needs up-to-date knowledge to answer up-to-date questions. - Recall we have discussed that is 175B parameter is heavily used for storing knowledge. If we could offload knowledge to be outside the model, then the model parameter might be significantly reduced such that eventually, it can run on a cellphone (call this crazy here, but ChatGPT is already science fiction enough, who knows what the future will be)."

& "Yet there was a WebGPT paper published in Dec 2021. It is likely that this is already tested internally within OpenAI."

It definitely feels like this may be the next step in making this kind of system robust. It ends up being an interface for search.



It's unclear to me how you could separate knowledge and reasoning:

- Reasoning typically requires base knowledge to work from. A side effect of training reasoning is embedding knowledge into the model parameters.

- Even if you offload the search portion (either through outputting special tokens that are postprocessed, or applying the model in multiple steps with postprocessing), you still need embedded knowledge for the model to decide what to search for, and then to successfully integrate that knowledge (in the multi-step case).

Maybe some kind of post-facto pruning of model weights?


Reasoning is that which knows that it lacks some necessary knowledge, whereas knowledge isn't aware that it lacks some necessary reasoning.


Fascinating - makes me think of the distinction between dreaming and being awake. Only in the latter state one can tell the difference.


Not quite true, see lucid dreaming.


To be fair to the GP comment, I've personally never experienced lucid dreaming, so for me the distinction holds - when I'm dreaming, I never know I'm dreaming, and mostly I don't even remember my life awake, I only remember whatever's in the context of the dream.


That would be a third, distinct state from each. Dreaming while knowing that you are is quite a thrill!


This has to be one of the most insightful sentences I've ever read.


Would you mind expanding on it a bit? I do sincerely appreciate its pithiness, but curious to read it explained a bit further.


Think of it as: reasoning=computation, knowledge=data. Data alone doesn’t say it must be computed. But computation, by definition, is attempting to create data (the result) that doesn’t exist. Thus: knowledge isn’t aware it must be reasoned about, but reasoning knows it’s trying to find (deduce, compute) knowledge it lacks.


I disagree. If you have knowledge, and you don't try to do anything other than compress it to make space, reasoning about that knowledge will come about by sheer unintended consequence once the patterns your compressing on reach some threshold of sophistication.

By definition, an optimal compression algo is a dimensionality reduction algo. A dimensionality reduction algo lets you do a bunch of machine learning tasks.


In the world of large language models, what part of "reasoning" is hard-coded and what part, if any, is learnt?

Is reasoning simply a scan/search of your vector space (i.e. your knowledge) according to some hard-coded algo?


Thanks for that!


Doesn't seem to unreasonable to me. If I asked you "How old is Obama" and you had a data source which had the ages of every person, you don't need to know the answer from memory. Your reasoning tells you that to find out how old someone is you need to check the external resource and what to do with the info once you get it.


The required knowledge is around entity recognition, with "Obama" referring to the 44th POTUS, and not somebody else who happens to have the same surname (and there are multiple of them actually, at least 4 given his family).


This can clearly be guessed from a search as well. Popularity can be well defined, and in the case of Obama, there is clearly one much more popular than the others.


The model still needs to infer from the sentence the entity to look it up. It is also the case that this is a relatively simple example as 'Obama' refers to a single class of entities and there is not a lot of ambiguity around resolution of class, only resolution of specific entity.

Take this sentence:

> When was KitKat released?

I could refer to the sweet, or the Android OS. Vastly different classes, and the model here needs to "decide" to ask for more information to disambiguate the class, and if the class is the sweet, then it needs to disambiguate the taste particular flavour possibly, and even ask the geographic location.


Yes, but the amount of knowledge necessary to decide how to make those sorts of decisions is far smaller than the amount of knowledge necessary to answer all such questions.


And that's perfectly fine. Humans have exactly the same problem. They will get this wrong, and you will reply "no, I'm talking about the android version". Language is ambiguous so we cannot expect machines to get it right all the time.


I do agree with you that it is fine, what I was getting at was that there needs to be a way to measure uncertainty in a manner that is robust to unbalanced distributions or context drifting.


Okay I see what you mean, I agree with you on this.


I hope nobody ever releases the famous Kitkat Club in Berlin from its chains. Because there are not so many.

My experience with ChatGPT is that it gets what I mean very well from the context.


Fair, although I’d think it acceptable if the response to your prompt was “which one? I found 4”


It would have way more than 4 people in the search results though. GP said there's at least 4 because there's him, his wife and his kids.

Even knowing Obama is a person is a knowledge-based leap. (To us humans) it's obvious the question means Barack Obama because he's the most notable subject for that name. But how do you prevent your AI from responding that the "Obama JS library is 5 years old"

https://github.com/rgbkrk/obama


If it's based on frequency of training data, the ex-president will have far more hits in the training corpus.

Now, on less talked about topics it doesn't sound any different than what happens with people

Q "How old is Tim?"

A "Which Tim are you talking about, you didn't give me crap to work with?"


The answer is obviously 62 because that's how old Tim Apple is.


Ha, it might give an age of 7 years or so when the moniker was invented.


Isn't it a sort of cultural knowledge to understand what is meant? Ie when we say Obama we mean one specific guy who is very important but if I'm talking about Mrs Watanabe it's a generic Japanese person?


You can already coax ChatGPT into interacting with external systems today; I set up a prompt where the model pretended to be a factory system on a communication bus. It could access its "inventory" by posting a prefixed message to the communication bus.

After a bit of prompt engineering the model could query inventory, "manufacture" various recipes, and store the end products in inventory.

It might be possible to look at the weight activations as it reasons through contacting the external system over the emulated communication bus? For a suitably varied set of commands you might be able to find a subset of weights that are most correlated to the task and prune the others. Then you'd be left with a model that can retrieve and store information, as well as perform reasoning tasks.

Still has problems with working memory (the input token limit, since the model is auto-regressive) given all the external information is coming back in via the prompt, but ChatGPT seems to handle that gracefully right now.


> It's unclear to me how you could separate knowledge and reasoning

yeah, me too. There are a few cases I'd point your attention to.

1, preschool toys, kids somehow manage to put the square peg in the square hole. I mean, they may chew on them or push them around, but there's a "moment of magic" when they make it all click together. Maybe there's some implicit knowledge there, I know I played games like that, but I don't remember.

2, sudoku. you don't really need to know anything, just make each line row and box different. no memorization, just look. but what about the rules? does that count as knowledge?

I've been reading some math books lately, and I think we're not alone. Coping with sets of sets is a hard question that people have been wondering about for, as far as I can tell, a long time.

For now, it's probably safe to say, knowledge about knowledge is different that just knowledge, and having one layer work on k1 and another layer work on k2 is ok. maybe someday add k3...kn. Other fields do that. Worth checking out.

I think, we could both get very fussy about what exactly that _means_. But for now, I'm happy to be charitable in my reading. I'd also expect them to run into some really thorny problems when they try to pin down exactly what's going on, just like everybody else does. For today, good for them. Seems like a nice win.


Different people might think diffrent, but when solving complex problems I do think I have seperate "think about / gather the facts" and "formulate the solution" phases.

I don't think about the totality of facts in the world - I think my brain is mentally extracting the facts that are relevant to the problem and then reason about those facts.

There is certinaly back/forth though, but I think I go "here is a bit of information, how does that apply? ok but what about this fact? ok here is how that would apply considering something else..." but I think this is still a gather -> solve -> gather -> solve


Humans (or at least I) have the ability to look up knowledge. What about an architecture where the main model has a background conversation with a knowledge model, just like I need to look up things I remember exist, but forget details off. Heck, such a knowledge model would have great value for people aswell.

Using written conversation as an interface between language models feels natural and completely bonkers at the same time.


I don’t know why you would want to separate them completely. You have some knowledge in your head, store some other in notes on your desk and some in a library (the kind with books in it).

You can make the model write to a local wiki when it encounters new information and read from its own wiki when it feels it needs to store stuff. You can also make it spend time randomly browsing the knowledge base reconsolidating it, reorganising and labeling it.

The architecture of the wiki doesn’t have to be “clever” in any way. It is just an old fasioned database with query function the model can write to and query from.


Reasoning can be defined in the abstract. Knowledge cannot.

For example, If I need A & B | D & E to get C I can reason that if I have B and want C, I need A or D & E.

Once I aquired this reasoning skill, I can apply it to any kind of "bool-sequence X required for Y" situation, regardless of what specificly X and Y are, or how many entities X encompasses.

Whereas if I know that a rocket engine requires an oxygen/methane mix to function, I cannot transfer that to the knowledge that I need a raincoat or umbrealla in order to avoid getting wet in the rain.


The problem with ChatGPT's "knowledge" is that it isn't trustworthy. It will happily output very confident sounding nonsense, or blatantly incorrect statements. We need a way to verify how accurate it's outputs are


ChatGPT made this nice COBOL program to create an S3 Bucket,a technical impossibility...

IDENTIFICATION DIVISION. PROGRAM-ID. CREATE-S3-BUCKET.

ENVIRONMENT DIVISION. CONFIGURATION SECTION.

INPUT-OUTPUT SECTION.

DATA DIVISION. FILE SECTION.

WORKING-STORAGE SECTION. 01 AWS-ACCESS-KEY PIC X(20). 01 AWS-SECRET-KEY PIC X(40). 01 BUCKET-NAME PIC X(255).

PROCEDURE DIVISION. CREATE-BUCKET. MOVE AWS-ACCESS-KEY TO AWS-ACCESS-KEY-VAR MOVE AWS-SECRET-KEY TO AWS-SECRET-KEY-VAR MOVE BUCKET-NAME TO BUCKET-NAME-VAR INVOKE AWS-S3 "CREATE-BUCKET" USING AWS-ACCESS-KEY-VAR AWS-SECRET-KEY-VAR BUCKET-NAME-VAR


How is that impossible? Plenty of libraries are available for COBOL, especially if you use COBOL.NET


What is the COBOL SDK for AWS?


Probably someone sells one, or you'd just use the AWS SDK for .NET via COBOL.NET.


There are http client libraries for COBOL, and it’s easy to use http to make S3 api calls.


What is the technical impediment to writing one?


"Eww, Cobol"


Just ask ChatGPT to implement it.


That was ChatGPT's first response, yes.


Whose knowledge is trustworthy? We've somehow come to associate certain institutions or scientific authorities with truth when that is about the furthest from real science:

"Have no respect whatsoever for authority; forget who said it and instead look what he starts with, where he ends up, and ask yourself, Is it reasonable?" -Richard P. Feynman

"One of the great commandments of science is, "Mistrust arguments from authority." -Carl Sagan

"In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual." -Galileo Galilei


I think part of the issue is that it’s easier to test the limits of or a humans knowledge, and ironically with your quotes I think you’ve supplied evidence that trust is crucial, in that the truest expression of those quotes would be to just deliver the payload and not attach any sort of authority by association to it.

You can’t trust it’s answers (to be fair that’s the existing status quo), but you also can’t easily test it because it will return reasonable sounding garbage. Conversely you can discover ignorance in most humans pretty quickly by exhausting their ability to respond (or your ability to ask).


A generative system, be it a neural network or a human, needs a way to test ideas in order to align with reality. If testing is available, then it is possible to advance the state of the art. Ideas are cheap, results matter.


Sure, but that doesn’t seem to square with the topic at hand - “why does an infinite truth and lies machine feel less trustworthy than another human”. It just isn’t a question that needs a high degree of abstraction to respond to.


It is sometimes a lie machine because it lacks grounding in verification. Humans get more grounding than language models but even we are not 100% there - remember the antivax hysteria. The most grounded field is science, but even in scientific papers most things don't replicate. Verification is hard on all levels and requires extensive work. In particle physics all scientists clump together around the CERN accelerator as it is the only source of verification they have (almost, I exaggerate a bit).

It's going to be important to develop AI methods to test and verify, I think unverified model outputs are worthless verbiage. Verification can be based on references, code execution, physical simulations, lab experiments and even language based simulations.

In a few years the situation is going to flip, AI is going to become more reliable than humans. Being tested on millions of cases, it will be more trustworthy than us, no human can be tested to that extent. It's going to be interesting to see how we react to super-valid AI. Our guiding role is going to shrink more and more, we will be the children.


Those quotes are not about trust, they are about the rhetorical technique of appeal to authority.

The actual payload though is the mistrust of authority exactly because we are all so susceptible to the logical fallacy of appeal to authority masquerading bullshit as truth.

There is no problem to solve here. ChatGPT should never be an authority on anything.


I, too, always recreate double blind experiments before I take the drugs my doctor gives me :)

I also double-check the transistors in my computer work correctly before I run any code on them, and of course I re-derive the physics to be able to do that :)

In practice you are an expert in a very small domain (if any) and in all the other domains you have no choice but to accept somebody's authority.


That's good, I don't go quite as far, but do try to consult multiple independent sources.

Doctors have been known to overprescribe things like Benzos, and opioids from time to time.

I also just use tools like a RAM diagnostic that can check large numbers of transistors at once. I imagine you're quite good at QM after all that practice applying the wave equation though. Impressive!


There is some kind of recursion in here with authors names and "have no respect..." part:)


Fair point


Actually, it isn't.

Their arguments are of the form "this statement could be false."

If you evaluate it as a true statement, you have no problems. It could be false; you have to evaluate it for yourself instead of trusting some authority.

It's only if you assert that it's certainly false that you have a problem. Because then it's clearly true -- since otherwise these authorities would be telling you something false, which proves their assertion true.

Put another way, it can get you from the undesirable position of blindly trusting authorities to the desirable position of questioning them, but not the other way around. Which is the intended result.


It’s slightly ironic that the only reason we pay attention to these particular quotes is that they come from famous physicists, i.e. authorities.


One way I tried to do this is by having it write an answer, and a footnote reference at each fact. [1] then list search terms that be used to verify each claim, then I would respond with the url and quotes from found pages for each one, then have it rewrite the answer based on that information and cite the sources. I think something this direction can be automated. I saw someone do this with math and other tasks, that would talk to a connected program before answering.


Yes, it's been done both in papers and in various GPT-3 projects. As long as you can find relevant references the LM will become reliable.


I did this as well and it looks great initially but there are already examples of GPT generating totally bogus references and sources. So we're back to square 1.


Sounds like an interesting way to reboot Wikipedia.


I just had a run in with this yesterday. I asked it to explain box embeddings. It's a pretty niche topic so I didn't expect it to give the right answer. But the answer it gave sounded so confident but it was so wrong. It took a not al vector embeddings approach but replaced that with box. I tried correcting it but it refused to budge and still sounded confident.


I asked it to explain part of my thesis work on Oblivious Transfer, and it gave me a lovely prose description of the Green-Hohenberger Oblivious Transfer protocol. It was clear and confident, and the thing it described was even an actual protocol. It just wasn’t in any way our protocol: GPT just took some classical protocol it found elsewhere and relabeled it.


Sounds like many humans I know.


ChatGPT to be employed in marketing positions immediately.


Think bigger. PresidentGPT. On tweeter!


You are right, this is the pain point - trust, verification. I think it will become the next focus of research.

There are many things we could do to solve this problem. One of them is to use an external reference for verification. Another one is to train the model to verify facts by augmenting the input with lies - adversarial training for lie detection. Problem solving can be improved by generating more data with the current version of LM for the next one, if we can verify the outputs to be correct.


Sure but you can only verify facts like "when was <someone> born?", you can verify this today easily with knowledge database but that's not what is interesting in ChatGPT, what's interesting is what it can generate which you can't easily fact check, like "generate me a poem in style of <someone> and <someone>", how can you verify that the style is correct automatically? or "write me code that connects to not public system and does <here long instruction in words>", how can you verify if this code works properly without access to that system and ability to run it yourself?


> There are many things we could do to solve this problem.

Just like what social networks have failed to do in years? Not sure it's that simple :-)


so, much like other knowledge sources?


Most knowledge sources don't make up totally fictional citations to nonexistent sources. Or, if they do, nobody uses them for anything serious. Even Wikipedia citations will get removed if they point to URLs that never existed.


if we focus on the best sources, even in studies a lot of research can't be replicated, and if we focus on te most common ones like newspapers and tv, I'd say most of it is made up or might as well be


Sure, nothing is perfect.

But I'm not talking about it just being wrong, I'm talking about it citing webpages and books that don't exist and never did[0]. If Wikipedia regularly had that sort of quality issue people just wouldn't use it. There's a threshold below which something stops being useful.

[0] Bloggs, Joe. "ChatGPT just makes stuff up". Nature, vol 123, 2022, pp 123-321. Wiley Online Library, https://doi.org/10.1111/111/111


That's just a bad take and it doesn't excuse the problems with GPT.


Ok but if I read a paper from a well known Author published in NeurIps or Nature I have a good sense of how trustworthy that paper might be. Even if we ask GPT to cite it's sources, which it will do, it will then also happily generate false sources. It's untrustworthy turtles all the way down.


This is similar to what happens to adults after completing standardized education - memorized knowledge is often discarded or greatly reduced but much of their reasoning capabilities remain. A similar thing happens to children with their phenomenological sensitivity being reduced and their emotional model remaining. Emotions shape intuition when we lack resources for reasoning, while reasoning shapes intelligence when we lack knowledge resources.

This suggests that there is some underlying structure related to our EQ and IQ that we learn through our bodies and the knowledge we gather from the world. The relationship between memory distillation, emotions, and reasoning could lead to some insights as to what this structure is. I would speculate that the refined structure is universal for all conscious beings, and that it can be formulated as a theory involving geometric invariance, similar to the standard model.

The LLM as simulators description is apt [0]. ChatGPT can be understood as an interface for navigating a knowledge space that offloads most reasoning to its users, much like a search engine. Generative models like GPT create a latent space but their ability to navigate it relies on flowing along the natural latent topology, meaning it uses probabilistic reasoning and needs carefully constructed prompts to find good starting points that don't descend into local extrema. Alternatively, the latent space could be given guard rails through RLHF or have base knowledge distilled and curated to smooth out the resulting topology.

[0] https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators


I think the chain-of-thought reasoning will be what fixes this. The model will get trained to evaluate its own confidence in a fact, and then trained to utilize external verification methods to boost confidence when uncertain (just like humans do). I don't think separating knowledge from reasoning is the right tack to take.


How much disk space does 175B parameters use? A float or half precision float per parameter or does it need pointers to connections too?

Given how responses are generated in seconds and for free I am fairly sure it could run on a desktop computer.


One float per param, so naively 175*4 = ~700GB on disk. Most recent models are trained in FP16 or BF16 so 350GB. And there's some work on quantizing them to INT8 so knock that down to a mere 175GB. You can definitely run it on a desktop computer using RAM and NVME offload to make up for the fact that you probably don't have 175GB of GPU memory available, but it won't be fast: https://huggingface.co/blog/bloom-inference-pytorch-scripts

OpenAI generates responses so fast by doing the generation in parallel across something like 8x80GB A100s (I don't know the exact details of their hardware setup, but NVIDIA's open FasterTransformer library achieves low latency for large models this way).


It'd be pretty surprising if you could quantize a text model and have it still work. It has to be using those lower bits to store text; it's not like you can round a letter up or down.


It's not storing any text? The weights are floating point numbers - the "text" is in some extremely high dimensional embedding space.


Of course it's storing text. GPT was trained for less than one epoch; they just continually throw new text in there and it mostly just remembers it (= learns it = compresses it). It's not simply "a high dimensional embedding" because words aren't differentiable; you'll get different words if you round off your "coordinates".

If you go to https://beta.openai.com/playground/ and prompt it "Read me the book Alice in Wonderland" it will quote you word for word the original book.


GPT's compression of text is a model of probabilities for the next token in a sequence, where a token is a bit of text from a vocabulary of ~52,000. You can definitely reduce the precision of the parameters that determine that model without hurting the model's overall accuracy much (consider truncating a probability like 98.0000001221151240690% to 98.0%).

Empirically, people have quantized the weights of language models down to INT4 with very little loss in accuracy; see GLM-130B: https://arxiv.org/abs/2210.02414


For anyone interested it's called retrieval transformer. Here is example of one from Google: RETRO[1][2]

1. https://arxiv.org/abs/2112.04426

2. https://jalammar.github.io/illustrated-retrieval-transformer...


I feel like in my tests, reasoning is currently clearly weaker than knowledge. When asked to provide demonstrations of simple mathematical theorems, I've observed ChatGPT repeatedly confuse assumptions and conclusions, and even get it to "demonstrate" facts that it knows are wrong when asked directly, like 1 = 2.


See REALM[1] for some older(2 years) work on this idea.

1. https://arxiv.org/abs/2002.08909



Or Meta's Atlas[0] for more recent work

[0]https://arxiv.org/abs/2208.03299


How is that amazing?

Restated as “model cannot include info it has not observed” it’s pretty much run of the mill, decades old physics.

It is still a machine under the hood bound by the known laws vetted by experiment.

x86 machines have not taken us beyond the known laws of the shared physical space.


How is hosting the knowledge in a large cloud database any different than hosting the model itself in the cloud? Why the need to run "reasoning" locally?


You could have a locally trained variant that uses a tuned model or set of models, plus local data




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: