"Can AI do math for us" is the canonical wrong question. People want self-drivin...

amanda99 · on Dec 24, 2024

> AI is often wrong, never knows when it's wrong, but people are like this too.

When talking with various models of ChatGPT about research math, my biggest gripe is that it's either confidently right (10% of my work) or confidently wrong (90%). A human researcher would be right 15% of the time, unsure 50% of the time, and give helpful ideas that are right/helpful (25%) or wrong/a red herring (10%). And only 5% of the time would a good researcher be confidently wrong in a way that ChatGPT is often.

In other words, ChatGPT completely lacks the meta-layer of "having a feeling/knowing how confident it is", which is so useful in research.

halayli · on Dec 24, 2024

these numbers are just your perception. The way you ask the question will very much influence the output and certain topics more than others. I get much better results when I share my certainty levels in my questions and say things like "if at all", "if any" etc.

vector_spaces · on Dec 24, 2024

I agree with this approach and use it myself, but these confidence markers can also skew output in undesirable ways. All of these heuristics are especially fragile when the subject matter touches the frontiers of what is known.

In any case my best experiences with LLMs for pure math research have been for exploring the problem space and ideation -- queries along the line of "Here's a problem I'm working on ... . Do any other fields have a version of this problem, but framed differently?" or "Give me some totally left field methods, even if they are from different fields or unlikely to work. Assume I've exhausted all the 'obvious' approaches from field X"

halayli · on Dec 25, 2024

That's exactly how I use it. I find claude way more plesent to use than any other gpt I've used.

mrbungie · on Dec 24, 2024

Yeah, blame the users for "using it wrong" (phrase of the week I would say after the o3 discussions), and then sell the solution as almost-AGI.

PS: I'm starting to see a lot of plausible deniability in some comments about LLMs capabilites. When LLMs do great => "cool, we are scaling AI". when LLMs do something wrong => "user problem", "skill issues", "don't judge a fish for its ability to fly".

amanda99 · on Dec 24, 2024

> these numbers are just your perception.

Of course they are, I hoped it was clear I was just sharing my experience trying to use it for research!

I did in general word it as I would a question to a researcher, which includes an uncertainty in it being true. E.g. this is from a recent prompt: "is this true in general, if not, what are the conditions for this to be true?"

portaouflop · on Dec 24, 2024

A human researcher that is basically right 40%-95% of the time would probably an Einstein level genius.

Just assume that the LLM is wrong and test their assumptions - math is one of the few disciplines where you can do that easily

amanda99 · on Dec 24, 2024

I think you are imagining a different class of "questions".

To clarify, I was doing research on applied math. My field is not analysis, but I needed to prove some bounds on certain messed up expressions (involving special functions, etc), and analyze an ODE that's not analytically solvable. I used the COT model a fair bit.

I would ask ChatGPT for hints/ideas/direction in proving various bounds, asking it for theorems or similar results in literature. This is exactly the kind of thing where a researcher would go "yeah this looks like X" or "I think I saw something like this in (book/article name)", or just know a method; or alternatively say they have no clue. ChatGPT most often will confidently give me a "solution", being right 10% of the time (when there's a pretty standard way to do it that I didn't see/know).

On the whole it was quite useful.

rednerrus · on Dec 24, 2024

It's pretty easy to test when it makes coding mistakes as well. It's also really good at "Hey that didn't work, here's my error message."

vintermann · on Dec 25, 2024

I think that is a lot about how it's tuned. It's optimized for questions which can be answered with one big answer with bullet points. It's also optimized for relatively easy questions that have clear and correct answers. I have yet to encounter a QA bot which will stop and ask clarifying questions before producing its big bullet point post of answers.

I think this is a sensible tuning in that it's probably what most people who log on to chatgpt want. Most questions people ask of it will have simple enough answers that require knowledge but not all that much reasoning.

But I see no reason why it couldn't be tuned to be more open ended, less eager to give the correct benchmark/exam answer right away. Indeed in the "internal narrative" of recent models, I see them ask themselves things I wish they asked me!

eleveriven · on Dec 24, 2024

Do you think there’s potential for AI to develop a kind of probabilistic reasoning?

Sparkyte · on Dec 24, 2024

It think it is every sci-fiction dreamer to teach a robot to love.

I don't think AI will think conventionally. It isn't thinking to begin with. It is weighing options. Those options permutate and that is why every response is different.

eleveriven · on Dec 29, 2024

I think teaching a robot to "love" might be more about simulating behaviors and responses associated with love...

rfurmani · on Dec 24, 2024

Absolutely agree. There's some interesting articles in a recent [AMS Bulletin](https://www.ams.org/journals/bull/2024-61-02/home.html?activ...) giving perspectives on this question: what does it do to math if there's a strong theorem prover out there, in what ways can AI help mathematicians, what is math exactly?

I find that a lot of AI+Math work is focused on the end game where you have a clear problem to solve, rather than the early exploratory work where most of the time is spent. The challenge is in making the right connections and analogies, discovering hidden useful results, asking the right questions, translating between fields.

I'm getting ready to launch [Sugaku](https://sugaku.net), where I'm trying to build tools for the above, based on processing the published math literature and training models on it. The kind of search of MR that you mentioned doing is exactly what a computer should do instead. I can create an account for you and would love some feedback.

pontus · on Dec 24, 2024

I agree. I think it comes down to the motivation behind why one does mathematics (or any other field for that matter). If it's a means to an end, then sure have the AI do the work and get rid of the researchers. However, that's not why everyone does math. For many it's more akin to why an artist paints. People still paint today even though a camera can produce much more realistic images. It was probably the case (I'm guessing!) that there was a significant drop in jobs for artists-for-hire, for whom painting was just a means to an end (e.g. creating a portrait), but the artists who were doing it for the sake of art survived and were presumably made better by the ability to see photos of other places they want to paint or art from other artists due to the invention of the camera.

heresie-dabord · on Dec 24, 2024

> Talking to fellow <humans> can be frustrating, and so is talking with AI, but AI conversations go faster and can take place in the middle of the night.

I made a slight change to generalise your statement, I think you have summarised the actual marketing opportunity.

goalieca · on Dec 24, 2024

> People want self-driving cars so they can drink and watch TV. We should crave tools that enhance our abilities, as tools have done since prehistoric times.

Improved tooling and techniques have given humans the free time and resources needed for arts, culture, philosophy, sports, and spending time to enjoy life! Fancy telecom technologies have allowed me to work from home and i love it :)

didibus · on Dec 24, 2024

I think I'm missing your point? You still want to enjoy doing math yourself? Is that what you are saying? So you equate "Can AI do math in my place?" with "Can AI drink and watch TV in my place?"

whimsicalism · on Dec 24, 2024

Ingredients to a top HN comment on AI include some nominal expert explaining why actually labor won’t be replaced and it will be a collaborative process so you don’t need to worry sprinkled with a little bit of ‘the status quo will stay still even though this tech only appeared in the last 2 years’

FiberBundle · on Dec 24, 2024

It didn't appear in the last two years. We have had deep learning based autoregressive language models (like Word2Vec) for at least 10 years.

fosk · on Dec 24, 2024

Early computer networks appeared in the 1960s and the public internet as we know it in the 1990s.

We are still early in AI.

whimsicalism · on Dec 24, 2024

totally, and i’ve been working with attention since at least 2017. but i’m colloquially referring to the real breakout and substantial scale up in resources being thrown at it

elbear · on Dec 24, 2024

In a way, AI is part of the process, but it's a collaborative process. It doesn't do all the work.

bubble12345 · on Dec 24, 2024

AI will not do math for us, but maybe eventually it will lead to another mainstream tool for mathematicians. Along with R, Matlab, Sage, GAP, Magma, ...

It would be interesting if in the future mathematicians are just as fluent in some (possibly AI-powered) proof verifying tool, as they are with LaTeX today.

bufferoverflow · on Dec 24, 2024

AI can already do a bunch of math. So "AI will not do math for us" is just factually wrong.

vlovich123 · on Dec 24, 2024

Can AI solve “toy” math problems that computers have not been able to do? Yes. Can AI produce novel math research? No, it hasn’t yet. So “AI will not do math for us” is only factually wrong if you take the weaker definition of “doing math for us”. The stronger definition is not factually wrong yet.

More problematic with that statement is that a timeline isn’t specified. 1 year? Probably not. 10 years? Probably. 20 years? Very likely. 100 years? None of us here will be alive to be proven wrong but I’ll venture that that’s a certainty.

khafra · on Dec 24, 2024

This is a pretty strong position to take in the comments of a post where a mathematician declared the 5 problems he'd seen to be PhD level, and speculated that the real difficulty with switching from numerical answers to proofs will be finding humans qualified to judge the AI's answers.

I will agree that it's likely none of us here will be alive to be proven wrong, but that's in the 1 to 10 year range.

vlovich123 · on Dec 28, 2024

solving PhD level problems != producing new lines of research. PhDs are typically given problems their advisors know are solvable but difficult and might contribute in some way to a larger scope of research the PhD doesn’t yet understand or hasn’t earned the “right” to explore on their own. And phds do frequently do their own research exploration, but that doesn’t involve solving these kinds of “PhD-level” problems which just means having the knowledge about the techniques available and creativity in applying to solving them (as evidenced by the poster noting how they could solve some of these on their own fairly quickly).

I don’t see how my position is so exceptionally strong. I’m saying indeed there’s a 55-70% probability that this happens in the 1-10 year time frame. At 1-20 it goes up to 70-90%. It’s still important to leave room for doubt that we might miss something or be unable to build something for a long time. Trying to state otherwise seems like an even stronger position to take to me.

khafra · on Dec 30, 2024

Yeah, I made the right reply, but to the wrong person. bubble12345 was confidently wrong, and bufferoverflow got downvoted for correcting him, but your caveats to his answer were fine; and that PDF is well within the sane range, even if mine is substantially tighter.

fooker · on Dec 24, 2024

Your idea of ‘do math’ is a bit different from this context.

Here it means do math research or better, find new math.

eleveriven · on Dec 24, 2024

The analogy with self-driving cars is spot on

ninetyninenine · on Dec 24, 2024

Your optimism should be tempered with the downside of progress meaning that AI in the near future may not only inspire creativity in humans, but it can replace human creativity all together.

Why do I need to hire an artist for my movie/video game/advertisement when AI can replicate all the creativity I need.

wnc3141 · on Dec 24, 2024

There is research on AI limiting creative output in completive arenas. Essentially it breaks expectancy therefore deteriorates iteration.

https://direct.mit.edu/rest/article-abstract/102/3/583/96779...

immibis · on Dec 24, 2024

This was about mathematics.

ninetyninenine · on Dec 30, 2024

Creativity isn’t limited to mathematics. My examples just didn’t use mathematicians.

afroboy · on Dec 25, 2024

Why do i need you as director if can ask AI to make movie for me?

ninetyninenine · on Dec 25, 2024

you dont. that's my point. Probably don't even need the person who hires the director.