Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?
4
Ṁ60
2050

Invalid contract

To expand on the title: once an AI beats our current hardest engineering benchmarks, how many years will it be before humans are not hired to do software engineering anymore?

The benchmarks I'll consider for this question are:

Once a single model achieves all of these, how many years will it be before "software engineer" (as understood in 2025) is not a job humans get hired for?

Some notes on resolution:

  • This market is about humans doing the core work of software engineering - opening tickets, pulling a branch, writing new code, testing it, submitting PRs, etc.

  • If "software engineer" is still a job title but means something different, the market still resolves.

  • If software engineers stay employed but their work changes the market still resolves - e.g. if all software engineers switch from being ICs to "AI managers" of some sort.

  • If this happens before the benchmarks are beaten then the market resolves to 0.

  • If these benchmarks undergo minor variations, I'll allow the market to resolve based on either the original or the variant (e.g. if a question is added to RE-bench, or a different subset of SWE-bench becomes popular).

  • If there are still some humans doing software engineering here and there the market still resolves - I'm not really interested in whether some random small companies or government departments will refuse to change.

*The paper lists 0.98 as the average score for testers from METR's professional network, which was their best group of testers. They don't give a variance so I'm adding a little for wiggle room but I think this is reasonably close to "peak human".

Get
Ṁ1,000
and
S3.00

Related questions

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
73% chance
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?
92% chance
Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
1.6
Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?
37
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules