>> RE Chomsky: You can see it's like epicycles: with enough parameters, an LLM is like a numerical method for curve fitting, that doesn't explain the data (any more than a fourier transform does). Curiously, they do seem to predict very accurately... yet also generalize strangely ("hallucinate"). What to think?
Well, that's the fundamental problem of modelling: that for any set of observations there's an arbitrary number of models that fit the data with great accuracy and even predict future observations well; and we don't know which one is the best in the long term.
The answer is that we should prefer not predictive models, but explanatory theories, that not only predict future observations but also explain why those observations should be expected to be made.
For example, the epicyclical model did not explain anything: it said nothing about why the planets should move on circular orbits with epicycles. Kepler's laws didn't explain anything because they didn't say why the planets should move on ellpitical orbits. Newton's law of universal gravitation explained it all in one stroke: because gravity. And that's why we consider Newton the greatest scientist of his era, not Kepler, not Coppernicus, not Gallileo, but Newton, because he explained the world and didn't just describe it.
Ultimately the advantage is, like you say, that when an explanatory theory fails, we can better know why. When a predictive model fails, we have no clue.
>> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.
Gold's negative learnability result was a huge upheaval that led directly to the current paradigm of machine learning. Chomsky used it to support his argument about the poverty of the stimulous but linguistics was only one of the two fields that Gold's result turned upside down.
And it was a negative result. As I say in another comment, science gives you the tools to know when you're wrong and that's how progress is made, when we find out where we were wrong before.
With epicycles, it took almost two thousand years before we figured out where the model was wrong. Let's hope that it doesn't take that long with LLMs and neural nets also, because I doubt we have another couple thousand years to spare on a wild goose chase.
>> (I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)
The epicyclical model persisted for so long because it was so good, and because there was nothing better. It is common for people who don't understand science to look at scientists of the past with derision and think they weren't even scientists, but for almost two thousand years, astronomers did exactly what a scientist must do: they accepted the best available theory, even if many of them hated it with a burning passion (and they did!). If it wasn't for the ancients stumbling and fumbling in the dark for millennia, we wouldn't today be enlightened and we owe them every respect.
Maybe an advantage of an explanatory theory is in revealing more of the "black box", giving more ways to check the theory. (But I'm not sure how this could apply to Newton's gravity, since the only observations were outcomes. And no plausible way to "experiment".)
> If it wasn't for the ancients stumbling and fumbling in the dark for millennia
Is there any evidence that the epicyclic models helped scientific understanding, even indirectly? Later theories didn't seem to build on it. I wonder if it actually detoured understanding, with its misleadingly impressive accuracy, so that understanding would have progressed more quickly without it.
Thinking of pg's "great work" (https://news.ycombinator.com/item?id=36550615): to be the Newton of neural nets would seem the most ambitious aspiration of our times. But it took a bunch of geniuses just to get to Newton... and it seems an even harder problem than planetary motion.
Though a difference is neural nets are based on actual neurons (loosely!).
It's looking like working human-level AI will precede understanding... perhaps by those 2000 years?
The gravity model is similar though: we posit a force that pulls things together, but we don't know /why/ that force seems to exist, no more than the ancients knew /why/ the planets seemed to move in smaller circles along their circular paths. We're really not /that/ enlightened, after all.
I think that's right, but ultimately all explanations we have are based on prior knowledge that is itself not necessarily complete. It's explanations all the way down, until we hit some primary observations or axiomatic assumptions that are the hardest to get rid of.
"Enlightened" was my bad choice of a word. I get overexcited when I think of how much we have learned in the past couple thousand years and I forget that we mainly learned how little we know. Or can explain!
I'd like to think explaining means giving a model simpler than the observations. But this also can be true of a purely predictive model, that offers no "why". Another commenter pointed out that epicycles do simplify - so they do "explain" in this sense.
What defines an "explanation"? What makes something a "why"?
No, I'm sorry. I'm not a linguist so I only know the relation between Gold's result and linguistics second-hand. I'm more interested in it from the point of view of inductive generalisation in machine learning; that's my schtick.
Just to make sure I didn't hallucinate all that, I had an admittedly perfunctory search online and I could find this paper:
Whose introduction describes how Gold's result is considered to support the arguments for linguistic nativism from the poverty of the stimulus. Then again, the author doesn't seem to be a linguist himself and he doesn't give any more specific references, so I'm now a little worried; and your question remains un-answered.
Have you tried wading through Chomsky's early work on linguistics? I don't have the courage to. The closest I've got to is I have a friend who has read a couple of Chomsky's linguistics books. My friend is making a living as an astrologist now so maybe that's a bit of a warning there :P
(not who you asked) I thought this would be in the linked transcript, but it's not. Norvig must be getting it from elsewhere (maybe in the 404ed video?), but it seems like misrepresentation.
Well, that's the fundamental problem of modelling: that for any set of observations there's an arbitrary number of models that fit the data with great accuracy and even predict future observations well; and we don't know which one is the best in the long term.
The answer is that we should prefer not predictive models, but explanatory theories, that not only predict future observations but also explain why those observations should be expected to be made.
For example, the epicyclical model did not explain anything: it said nothing about why the planets should move on circular orbits with epicycles. Kepler's laws didn't explain anything because they didn't say why the planets should move on ellpitical orbits. Newton's law of universal gravitation explained it all in one stroke: because gravity. And that's why we consider Newton the greatest scientist of his era, not Kepler, not Coppernicus, not Gallileo, but Newton, because he explained the world and didn't just describe it.
Ultimately the advantage is, like you say, that when an explanatory theory fails, we can better know why. When a predictive model fails, we have no clue.
>> BTW Chomsky's point E (which I'd never heard of), the last and most minor, was based on Gold's work.
Gold's negative learnability result was a huge upheaval that led directly to the current paradigm of machine learning. Chomsky used it to support his argument about the poverty of the stimulous but linguistics was only one of the two fields that Gold's result turned upside down.
And it was a negative result. As I say in another comment, science gives you the tools to know when you're wrong and that's how progress is made, when we find out where we were wrong before.
With epicycles, it took almost two thousand years before we figured out where the model was wrong. Let's hope that it doesn't take that long with LLMs and neural nets also, because I doubt we have another couple thousand years to spare on a wild goose chase.
>> (I want to stress that the idea of epicycles, the mechanical craftsmanship, and actual prediction of the planets are all amazing genius.)
The epicyclical model persisted for so long because it was so good, and because there was nothing better. It is common for people who don't understand science to look at scientists of the past with derision and think they weren't even scientists, but for almost two thousand years, astronomers did exactly what a scientist must do: they accepted the best available theory, even if many of them hated it with a burning passion (and they did!). If it wasn't for the ancients stumbling and fumbling in the dark for millennia, we wouldn't today be enlightened and we owe them every respect.