Liam Price has no background in formal mathematics and hasn’t started university yet, but last month he achieved a notable breakthrough in mathematical research — using ChatGPT as his tool.
AI is threatening science jobs. Which ones are most at risk?
Working from his home in southwest England, Price guided the widely-used AI tool to crack Erdős problem #1196 — one of over 1,000 puzzles gathered by the Hungarian mathematician Paul Erdős (1913–1996) during his lifetime. What sets this AI-assisted solution apart from other AI-generated mathematical results is that it employed an approach that caught experts off guard (B. Alexeev et al. Preprint at arXiv 2026).
Sharing the news on the social media platform X, Stanford mathematician Jared Duker Lichtman compared the discovery to a novel chess opening — one that no human had considered before, he explained, because of “human aesthetics and convention.”
This achievement stands out among a growing list of AI breakthroughs in mathematics. Academic researchers and teams at AI companies have been aggressively exploring the limits of what these systems can accomplish. Computers are now doing more than just crunching numbers; they’re engaging in the kind of rigorous, logical reasoning that has been the hallmark of mathematicians since Euclid, more than 2,300 years ago.
Often, these advances have come from general-purpose large language models (LLMs) like GPT, Gemini, and Claude — systems that received no specialized mathematical training. And as is typical across the broader AI landscape, the pace of progress has been breathtaking.
For now, most of these systems are still repurposing methods they’ve picked up from existing literature. That was indeed the case with some of the earlier Erdős problems that Price and his collaborator Kevin Barreto — a mathematics undergraduate at Cambridge University in the UK — initially solved.

Artificial intelligence has proposed an unusual solution to a puzzle posed by Hungarian mathematician Paul Erdős.Credit: George Csicsery
However, with challenges like Erdős problem #1196, mathematicians have begun to see signs of genuine originality in what the models produce — the tools making unexpected links between different branches of mathematics. “It really is remarkable,” says Sébastien Bubeck, a mathematician at OpenAI in San Francisco, California. “A year ago, people assumed there might be some fundamental barrier — that LLMs would never be able to go beyond what they’d learned during training.”
Bubeck and others now believe it’s only a matter of time before AI independently contributes at the level of history’s greatest mathematicians — and perhaps even beyond that. “I’m hopeful that by 2030, AI and human mathematicians might together earn a Fields Medal,” says Thang Luong, who leads the Superhuman Reasoning team at Google DeepMind in Mountain View, California.
Fresh Approaches
Erdős first introduced problem #1196 back in 1966. It deals with “primitive” sets of whole numbers — sets in which none of the numbers divides any of the others evenly. (Prime numbers are the most familiar example of primitive sets.)

‘The job description is changing’: mathematician Terence Tao on the rise of AI
According to discussions across several online forums, previous attempts to solve problem #1196 relied on the framework of probability theory, so those efforts started by translating the problem into probabilistic terms. GPT, by contrast, tackled the problem directly in its original mathematical formulation — and yet its answer implicitly forged a connection between number theory and probability, notes Terence Tao, a mathematician at the University of California, Los Angeles.
Daniel Litt, a mathematician at the University of Toronto in Canada, describes the result as “reasonably interesting,” setting it apart from recent AI-generated solutions to other Erdős problems. Litt has been fairly underwhelmed by what AI has accomplished so far and is skeptical of the enthusiasm surrounding these achievements. But he argues that doubting AI’s long-term potential is misguided.
In fact, he finds it surprising that AI systems aren’t already making major discoveries. Their grasp of existing mathematics is beyond human capacity, and they’ve demonstrated impressive reasoning ability. On top of that, they never get tired or lose motivation.
“Part of the puzzle is that we don’t fully understand what makes a human mathematician excel,” Litt adds, noting that it remains unclear whether humans possess some inherent quality that makes them uniquely creative.
The Challenge of Proof
As with many aspects of AI development — especially scaling up computing power and refining algorithms — continued improvements will make these models increasingly capable. One significant limitation of AI-generated mathematics today is that current models can produce proofs that run no longer than about three to four pages. Models being tested internally at Google can already do better, Luong notes, and may soon reach around ten pages.
“A hundred pages isn’t within reach just yet, but we’re working toward it and seeing steady progress,” Luong explains — though he cautions that this will bring its own complications. Already, human reviewers are stretched thin trying to verify the correctness of standard mathematical papers, and the flood of AI-generated submissions is making the situation worse. “AI models are capable of generating something that appears very convincing, and it requires considerable effort to determine whether a mistake is hidden inside,” says Lauren Williams, a mathematician at Harvard University in Cambridge, Massachusetts.
Like researchers across many fields, she’s deeply concerned about the spread of what’s being called “AI slop.” “You can talk to numerous journal editors in mathematics who have alarming stories to share,” Williams says.



