zipfcharge's comments

zipfcharge · on May 8, 2024

It's because an LLM is essentially a probability matrix. You type a prompt, then it calculates what's the probability of getting a next word and so on, eventually forming a sentence. The probability learned is based on the training data.

Because of the underlying probability model, it's not going to be 100% deterministic. Plus a model like ChatGPT purposefully have "temperature" parameter that will further add randomisation to the whole process.

My answer is based on this paper if you're interested to read more: The Matrix: A Bayesian learning model for LLMs, https://arxiv.org/abs/2402.03175

flopriore · on May 9, 2024

Are there any ways to show the source of the information retrieved by the model? For instance, the LLM forms a sentence and it points to a stackoverflow answer with the same or similar content.

JKCalhoun · on May 9, 2024

As I understand it, pretty sure that is impossible. When it is input a single datum, sure, trivial. As soon as it is fed a second one though the weights are already a kind of blend of the two tokens (so to speak).

spmurrayzzz · on May 9, 2024

Its not impossible, but its definitely difficult. There is some overlap in the methods used to detect benchmark data contamination, though its not entirely the same thing. For the detection use case, you already know the text you're looking for and you are just trying to demonstrate that the model has "seen" the data in its training set. The challenge is proving that it is statistically improbable that the model could stochastically generate the same tokens without having seen them during training.

Some great research exists in this area [1] and I expect much of it may be repurposed for black box attribution in the future (in addition to all the work being done in the mechanistic interpretability field)

[1] https://arxiv.org/abs/2311.04850