"Can AI do math for us" is the canonical wrong question. People want self-driving cars so they can drink and watch TV. We should crave tools that enhance our abilities, as tools have done since prehistoric times.
I'm a research mathematician. In the 1980's I'd ask everyone I knew a question, and flip through the hard bound library volumes of Mathematical Reviews, hoping to recognize something. If I was lucky, I'd get a hit in three weeks.
Internet search has shortened this turnaround. One instead needs to guess what someone else might call an idea. "Broken circuits?" Score! Still, time consuming.
I went all in on ChatGPT after hearing that Terry Tao had learned the Lean 4 proof assistant in a matter of weeks, relying heavily on AI advice. It's clumsy, but a very fast way to get suggestions.
Now, one can hold involved conversations with ChatGPT or Claude, exploring mathematical ideas. AI is often wrong, never knows when it's wrong, but people are like this too. Read how the insurance incidents for self-driving taxis are well below the human incident rates? Talking to fellow mathematicians can be frustrating, and so is talking with AI, but AI conversations go faster and can take place in the middle of the night.
I don't want AI to prove theorems for me, those theorems will be as boring as most of the dreck published by humans. I want AI to inspire bursts of creativity in humans.
> AI is often wrong, never knows when it's wrong, but people are like this too.
When talking with various models of ChatGPT about research math, my biggest gripe is that it's either confidently right (10% of my work) or confidently wrong (90%). A human researcher would be right 15% of the time, unsure 50% of the time, and give helpful ideas that are right/helpful (25%) or wrong/a red herring (10%). And only 5% of the time would a good researcher be confidently wrong in a way that ChatGPT is often.
In other words, ChatGPT completely lacks the meta-layer of "having a feeling/knowing how confident it is", which is so useful in research.
these numbers are just your perception. The way you ask the question will very much influence the output and certain topics more than others. I get much better results when I share my certainty levels in my questions and say things like "if at all", "if any" etc.
I agree with this approach and use it myself, but these confidence markers can also skew output in undesirable ways. All of these heuristics are especially fragile when the subject matter touches the frontiers of what is known.
In any case my best experiences with LLMs for pure math research have been for exploring the problem space and ideation -- queries along the line of "Here's a problem I'm working on ... . Do any other fields have a version of this problem, but framed differently?" or "Give me some totally left field methods, even if they are from different fields or unlikely to work. Assume I've exhausted all the 'obvious' approaches from field X"
Yeah, blame the users for "using it wrong" (phrase of the week I would say after the o3 discussions), and then sell the solution as almost-AGI.
PS: I'm starting to see a lot of plausible deniability in some comments about LLMs capabilites. When LLMs do great => "cool, we are scaling AI". when LLMs do something wrong => "user problem", "skill issues", "don't judge a fish for its ability to fly".
Of course they are, I hoped it was clear I was just sharing my experience trying to use it for research!
I did in general word it as I would a question to a researcher, which includes an uncertainty in it being true. E.g. this is from a recent prompt: "is this true in general, if not, what are the conditions for this to be true?"
I think you are imagining a different class of "questions".
To clarify, I was doing research on applied math. My field is not analysis, but I needed to prove some bounds on certain messed up expressions (involving special functions, etc), and analyze an ODE that's not analytically solvable. I used the COT model a fair bit.
I would ask ChatGPT for hints/ideas/direction in proving various bounds, asking it for theorems or similar results in literature. This is exactly the kind of thing where a researcher would go "yeah this looks like X" or "I think I saw something like this in (book/article name)", or just know a method; or alternatively say they have no clue. ChatGPT most often will confidently give me a "solution", being right 10% of the time (when there's a pretty standard way to do it that I didn't see/know).
I think that is a lot about how it's tuned. It's optimized for questions which can be answered with one big answer with bullet points. It's also optimized for relatively easy questions that have clear and correct answers. I have yet to encounter a QA bot which will stop and ask clarifying questions before producing its big bullet point post of answers.
I think this is a sensible tuning in that it's probably what most people who log on to chatgpt want. Most questions people ask of it will have simple enough answers that require knowledge but not all that much reasoning.
But I see no reason why it couldn't be tuned to be more open ended, less eager to give the correct benchmark/exam answer right away. Indeed in the "internal narrative" of recent models, I see them ask themselves things I wish they asked me!
It think it is every sci-fiction dreamer to teach a robot to love.
I don't think AI will think conventionally. It isn't thinking to begin with. It is weighing options. Those options permutate and that is why every response is different.
Absolutely agree. There's some interesting articles in a recent [AMS Bulletin](https://www.ams.org/journals/bull/2024-61-02/home.html?activ...) giving perspectives on this question: what does it do to math if there's a strong theorem prover out there, in what ways can AI help mathematicians, what is math exactly?
I find that a lot of AI+Math work is focused on the end game where you have a clear problem to solve, rather than the early exploratory work where most of the time is spent. The challenge is in making the right connections and analogies, discovering hidden useful results, asking the right questions, translating between fields.
I'm getting ready to launch [Sugaku](https://sugaku.net), where I'm trying to build tools for the above, based on processing the published math literature and training models on it. The kind of search of MR that you mentioned doing is exactly what a computer should do instead. I can create an account for you and would love some feedback.
I agree. I think it comes down to the motivation behind why one does mathematics (or any other field for that matter). If it's a means to an end, then sure have the AI do the work and get rid of the researchers. However, that's not why everyone does math. For many it's more akin to why an artist paints. People still paint today even though a camera can produce much more realistic images. It was probably the case (I'm guessing!) that there was a significant drop in jobs for artists-for-hire, for whom painting was just a means to an end (e.g. creating a portrait), but the artists who were doing it for the sake of art survived and were presumably made better by the ability to see photos of other places they want to paint or art from other artists due to the invention of the camera.
> Talking to fellow <humans> can be frustrating, and so is talking with AI, but AI conversations go faster and can take place in the middle of the night.
I made a slight change to generalise your statement, I think you have summarised the actual marketing opportunity.
> People want self-driving cars so they can drink and watch TV. We should crave tools that enhance our abilities, as tools have done since prehistoric times.
Improved tooling and techniques have given humans the free time and resources needed for arts, culture, philosophy, sports, and spending time to enjoy life! Fancy telecom technologies have allowed me to work from home and i love it :)
I think I'm missing your point? You still want to enjoy doing math yourself? Is that what you are saying? So you equate "Can AI do math in my place?" with "Can AI drink and watch TV in my place?"
Ingredients to a top HN comment on AI include some nominal expert explaining why actually labor won’t be replaced and it will be a collaborative process so you don’t need to worry sprinkled with a little bit of ‘the status quo will stay still even though this tech only appeared in the last 2 years’
totally, and i’ve been working with attention since at least 2017. but i’m colloquially referring to the real breakout and substantial scale up in resources being thrown at it
AI will not do math for us, but maybe eventually it will lead to another mainstream tool for mathematicians. Along with R, Matlab, Sage, GAP, Magma, ...
It would be interesting if in the future mathematicians are just as fluent in some (possibly AI-powered) proof verifying tool, as they are with LaTeX today.
Can AI solve “toy” math problems that computers have not been able to do? Yes. Can AI produce novel math research? No, it hasn’t yet. So “AI will not do math for us” is only factually wrong if you take the weaker definition of “doing math for us”. The stronger definition is not factually wrong yet.
More problematic with that statement is that a timeline isn’t specified. 1 year? Probably not. 10 years? Probably. 20 years? Very likely. 100 years? None of us here will be alive to be proven wrong but I’ll venture that that’s a certainty.
This is a pretty strong position to take in the comments of a post where a mathematician declared the 5 problems he'd seen to be PhD level, and speculated that the real difficulty with switching from numerical answers to proofs will be finding humans qualified to judge the AI's answers.
I will agree that it's likely none of us here will be alive to be proven wrong, but that's in the 1 to 10 year range.
solving PhD level problems != producing new lines of research. PhDs are typically given problems their advisors know are solvable but difficult and might contribute in some way to a larger scope of research the PhD doesn’t yet understand or hasn’t earned the “right” to explore on their own. And phds do frequently do their own research exploration, but that doesn’t involve solving these kinds of “PhD-level” problems which just means having the knowledge about the techniques available and creativity in applying to solving them (as evidenced by the poster noting how they could solve some of these on their own fairly quickly).
I don’t see how my position is so exceptionally strong. I’m saying indeed there’s a 55-70% probability that this happens in the 1-10 year time frame. At 1-20 it goes up to 70-90%. It’s still important to leave room for doubt that we might miss something or be unable to build something for a long time. Trying to state otherwise seems like an even stronger position to take to me.
Yeah, I made the right reply, but to the wrong person. bubble12345 was confidently wrong, and bufferoverflow got downvoted for correcting him, but your caveats to his answer were fine; and that PDF is well within the sane range, even if mine is substantially tighter.
Your optimism should be tempered with the downside of progress meaning that AI in the near future may not only inspire creativity in humans, but it can replace human creativity all together.
Why do I need to hire an artist for my movie/video game/advertisement when AI can replicate all the creativity I need.
I'm a research mathematician. In the 1980's I'd ask everyone I knew a question, and flip through the hard bound library volumes of Mathematical Reviews, hoping to recognize something. If I was lucky, I'd get a hit in three weeks.
Internet search has shortened this turnaround. One instead needs to guess what someone else might call an idea. "Broken circuits?" Score! Still, time consuming.
I went all in on ChatGPT after hearing that Terry Tao had learned the Lean 4 proof assistant in a matter of weeks, relying heavily on AI advice. It's clumsy, but a very fast way to get suggestions.
Now, one can hold involved conversations with ChatGPT or Claude, exploring mathematical ideas. AI is often wrong, never knows when it's wrong, but people are like this too. Read how the insurance incidents for self-driving taxis are well below the human incident rates? Talking to fellow mathematicians can be frustrating, and so is talking with AI, but AI conversations go faster and can take place in the middle of the night.
I don't want AI to prove theorems for me, those theorems will be as boring as most of the dreck published by humans. I want AI to inspire bursts of creativity in humans.