Specifically, this resolves YES if:
(1) A new benchmark is announced before the end of 2025; and
(2) The best AI result published within three months after the announcement is less than half of the human-level target. (For example, if human-level performance is claimed to be 80%, an AI will need to reach at least 40%.)
If multiple new benchmarks are created in 2025, this will resolve YES if condition 2 is true for any of them.
Update 2025-09-01 (PST) (AI summary of creator comment): Resolution Criteria Update:
CPU or compute cost caps will be ignored when evaluating the AI performance.
@Nick6d8e Hmm, good question. I'm interested in comparing with o3's performance on ARC-AGI-1, and I understand they spent up to $1,000 per question, so I think I'll ignore the CPU cap.