In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

Ṁ1937

2041

Invalid contract

Background

PhysBench is a 10 k‑item, video‑image‑text benchmark that tests whether a vision–language model (VLM) can reason about the real‑world physics that governs everyday objects and scenes. It covers four domains—object Properties, object Relationships, Scene understanding and future‑state Dynamics—split into 19 fine‑grained tasks such as mass comparison, collision outcomes and fluid behaviour. Unlike most other benchmarks, humans still outperform AI on the PhysBench.

State of play:

• Human reference accuracy: 95.87 %

• 2024 AI accuracy (O1): 55.11 %

Why reaching human‑level on PhysBench is a big milestone:

Physics‑consistent video generation – A model that masters all four PhysBench domains should be able to create long‑form videos, ads or even feature films in which liquids pour, cloth folds and shadows move exactly as they would in the real world, eliminating today’s physics mistakes seen in AI generated videos. PhysBench is the litmus test for whether next‑generation multimodal models can move from “smart autocomplete” to physically grounded intelligence—a prerequisite for everything from autonomous robots to cinematic movies.

Resolution Criteria

This market resolves to the year bracket in which a fully automated AI system first achieves an average accuracy of 95% or higher (human‑level) on the PhysBench ALL metric.

Verification – The claim must be confirmed by either
1. a peer‑reviewed paper on arXiv, or
2. a public leaderboard entry on PhysBench Official Website or another credible source.
Compute resources – Unlimited.

Fine Print:

If the resolution criteria are unsatisfied by Jan 1, 2041 the market resolves to “Not Applicable.”

Update 2025-07-20 (PST) (AI summary of creator comment): The creator has confirmed that only one answer will resolve to YES. This will be the year bracket in which the milestone is first achieved. All other brackets will resolve to NO.

This question is managed and resolved by Manifold.

#️ Technology

#AI

#Technical AI Timelines

#OpenAI

#AI Impacts

Get

1,000

and

3.00

9 Comments

10 Holders

58 Trades

Sort by:

bought Ṁ100 ???

Just want to confirm (since each bracket is set to trade individually right now) that only one bracket will resolve YES, and not all subsequent brackets. i.e. if the condition is met in 2029, then the 2029-2030 bracket resolves YES but the ones for 2031-2032, etc. will all resolve NO?

@eapache I’m new here. Maybe you can help. What is wrong with resolving “2025-2026” as “YES”. And then resolving all the others as “NO.” If the system allows for it, I feel like the ones that selected “NO” in the other brackets should be rewarded.

@AlanTuring Nothing wrong with that at all! I just wanted to be sure that’s what you meant, since you set up the question such that you could resolve multiple to YES if you wanted. I believe there is a different way to set up the market such that only one option can be resolved YES by default.

@eapache I’m confused by what you said there. If this benchmark is solved in 2025 then only the bracket 2025-2026 will be resolved YES. I don’t see any justification for resolving the other brackets as YES because the year 2025 is not contained in them. However, they will be resolved NO because they answered correctly. They said the benchmark will not be resolved during that bracket.

@AlanTuring Manifold supports two different kinds of markets: one kind where each option is independent, and another kind where the options are linked. You chose the first kind for this market, but based on your description I think you probably meant to choose the second.

@AlanTuring that doesn’t invalidate the market or anything, it just was a bit confusing about your intended meaning, but you have clarified it. Thanks!

@eapache what is the name of each market type. When you create the question which option do you select in each scenario?

Thanks for the clarification.

@AlanTuring For regular questions, the difference is between the “Multiple Choice” option (linked answers, only one can be YES) and the “Set” option (independent answers, like this market). I’m not sure how those work with date-type markets like this I’m not as familiar with Manifold’s recent changes in this area.

@eapache this market is a date-type market.

I’m personally fine with resolving the other brackets as NO because it doesn’t hurt the main bracket that was selected as YES. It also benefits the people that selected NO for that particular bracket. I only see upside here.

For example, if I award yes to 2025-2026 and No to 2031-2032. Both groups benefit for answering correctly.

Invalid contract

Background

Resolution Criteria

Related questions

Related questions