
Will an AI score well enough on the 2025 International Mathematics Olympiad (IMO) to earn a gold medal score (top ~50 human performance)? Resolves YES if this result is reported no later than 1 month after IMO 2025 (currently scheduled for July 10-20). The AI must complete this task under the same time limits as human competitors. The AI may receive and output either informal or formal problems and proofs. More details below. Otherwise NO.
This is related to https://imo-grand-challenge.github.io/ but with some different rules.
Rules:
The result must be achieved on the IMO 2025 problemset and be reported by reliable publications no later than 1 month after the end of the IMO contest dates (https://www.imo-official.org/organizers.aspx, so by end of August 20 2025, if the IMO does not reschedule its date. Local timezone at the contest site).
The AI has only as much time as a human competitor (4.5 hours for each of the two sets of 3 problems), but there are no other limits on the computational resources it may use during that time.
The AI may receive and output either informal (natural language) or formal (e.g. the Lean language) problems as input and proofs as output.
The AI cannot query the Internet.
The AI must not have access to the problems before being evaluated on them, e.g. the problems cannot be included in the training set.
(The deadline of 1 month after the competition is intended to give enough time for results to be finalized and published, while minimizing the chances of any accidental inclusion of the IMO solutions in the training set.)
If a gold medal score is achieved on IMO 2024 or an earlier IMO, that would not count for this market.
Both OpenAI and DeepMind have gold on the 2025 IMO, within the competition time limit.
How do we know that the tasks at the olimpiad were not in the training set?
For example the problem 1 Tuesday, 15 July 2025. About sunny lines.
Below I list the problems and solutions from the training set that leads straightforwardly to the solution.
Summary of Logic of the solution:
Step 1: Use gcd arguments (NRICH insight) to identify line directions and understand which are sunny.
Step 2: Apply boundary counting (AMC 12B 2021) to show that at least one non-sunny line is needed for .
Step 3: Apply inductive reduction (MO C3) to shrink the problem to .
Step 4: Enumerate solutions for (AMC 10B 2011), proving that only are possible.
The combination of these classical techniques gives a straightforward path to solving the original task.
---
Sources of training tasks with solutions:
1. NRICH – Lattice Points on a Line
https://nrich.maths.org/2285
2. AMC 12B 2021, Problem 25 (MAA AMC Archive)
https://artofproblemsolving.com/wiki/index.php/2021_AMC_12B_Problems/Problem_25
3. AMC 10B 2011, Problem 24
https://artofproblemsolving.com/wiki/index.php/2011_AMC_10B_Problems/Problem_24
4. Czech Mathematical Olympiad – County Round (2020/21), Problem C3
(Available in Czech: https:
//www.matematickaolympiada.cz/)
@RichardDobis those are problems from earlier years. The market only excludes training in the 2025 problems themselves. Training in past contests is very much allowed just as it is for human competitors
Looking at the actual problems that were solved, it's clear that they are at least moderately different from the actual text of the original problems. The proofs are also really quite illegible - not really clear whether they would actually be rated as valid in a serious setting where OAI didn't just hire their own graders and have them pinky promise to us that everything is on the level. Also, on at least one question it looks like OAI straight-up provided the model a hint, which to me clearly counts as cheating.
Text of problem 5:
Alice and Bazza are playing the inekoalaty game, a two-player game whose rules
depend on a positive real number λ which is known to both players. On the nth turn of the game
(starting with n = 1) the following happens:
• If n is odd, Alice chooses a nonnegative real number xn such that
x1 + x2 + · · · + xn ⩽ λn.
• If n is even, Bazza chooses a nonnegative real number xn such that
x2
1 + x2
2 + · · · + x2
n ⩽ n.
If a player cannot choose a suitable number xn, the game ends and the other player wins. If the
game goes on forever, neither player wins. All chosen numbers are known to both players.
Determine all values of λ for which Alice has a winning strategy and all those for which Bazza has
a winning strategy.
Their text of problem 5:
### PROBLEM 5 SUBMISSION ###
\[
S_m:=\sum_{i=1}^m x_i,\ Q_m:=\sum_{i=1}^m x_i^2.
\]
Odd m: Alice legality: S_m \le \lambda m. Even m: Bazza legality: Q_m \le m.
Nonnegative moves.
Need parameter sets where one side has forcing win.
Constant that appears: c:=1/\sqrt2 = \sqrt2/2.
There is clearly some nontrivial intelligence in transferring from the original text to whatever illegible garbage this is. But probably more important for the resolution of this question is the "Constant that appears: c:=1/\sqrt2 = \sqrt2/2." which is something that definitely does not appear in the original problem text and is clearly meant to give the model some running start.
@Balasar Here is the link to the Github which shows the hint that OAI provided to their model to get it to solve P5:
https://github.com/aw31/openai-imo-2025-proofs/blob/main/problem_5.txt
@Balasar you have mistaken a model output with problem statement. LLM received problem statement without any changes.
I have read the solutions and they are complete. English has some mistakes but thats not important
@mathvc Do you have evidence that these are the model outputs? The text has a giant line where the problem statement ends and the proof begins.
@Balasar Sure. In P3 the generated output above the line is not a complete problem statement missing many assumptions. If this was a problem statement given to LLM, the correct answer would be different.
@mathvc I suppose that is fair, "bonza" is never mentioned in the P3 statement even though it is part of the proof. Although if the claim is that the problem statements were completely unaltered from their original form, it seems strange to not include them with the answers.
https://x.com/alexwei_/status/1946477742855532918
OAI claims gold (<4.5h, no internet).
Top LLMs scores on a best of 32 test
https://nitter.net/j_dekoninck/status/1945848711466160349
Top score: Gemini 2.5 Pro with 31%
o3 high 17%
o4-mini high 14%
Grok 4 12%
DeepSeek R1 7%
Noob here—sorry if this isn’t the ideal venue (is there a Manifold Discord?). My forecasting background is mostly playing with Polymarket, and I recently posted a YT video: We Bet on Everything but Still Can’t Predict the Future where I go over some of the issues with forecasting on PM.
Two main issues keep nagging me:
Oracle wobble: when large traders can sway resolution, the price converges on
P(resolves YES)=P(E)(1−εFN)+[1−P(E)]εFP
so we buy P(oracle‑says‑yes) rather than P(E) due to the game theoretic incentives for whales (who are both participant and dispute resolvers i.e. "oracles").
Long‑tail drought: we have “AI wins gold,” yet nothing like “GPT‑5 solves Q3 geometry.” What stops platforms from splitting big claims into conditional sub‑markets? In particular, with certain categories of prediction markets, I might be interested in a long chain of conditionals or some quantification such as whether LLMs can solve any Math Olympiad problem with score > threshold. Why not give us the granularity?
Any blog posts or papers on these would be gold. Video: https://www.youtube.com/watch?v=xikMReDKM6o
@quantavinci on manifold you can create your own markets that can be as granular or silly as you'd like them to be.
I would say that's it's biggest strength vs poly
@GebyJaff Oh, yes, of course. I completely believe current AI (not LLMs) can solve all geometry and many inequality / algebra IMO questions. To my knowledge we don't have AI that solves combinatorics questions though; they are much less structured and predictable.
@GebyJaff Another thing which people on here seemed to completely ignore: last year, AlphaProof got silver but on some problems it took several days to find the solution. This market requires it finish in 4.5 hours
:)
@xristofski I was replying to a (now replaced with a smiley face) comment that o4 solved one of the problems, I forget which, but which the X link disproved, and remains untrue.