I'd add to this, any moderately involved logical or numerical problem causes hallucinations for me on all frontier models.
If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.
As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.
If the logic/math is part of a larger problem the miss rate is near 100%.
LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.
If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.
If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.
As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.
If the logic/math is part of a larger problem the miss rate is near 100%.
LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.
If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.