Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code? | Manifold

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

5

Ṁ70

2050

Invalid contract

To expand on the title: once an AI beats our current hardest engineering benchmarks, how many years will it be before humans are not hired to do software engineering anymore?

The benchmarks I'll consider for this question are:

SWE-bench Lite: better than 90% resolved.
RE-Bench: mean normalized score >= 1.2*
CodeContests: pass@5 >= 0.9

Once a single model achieves all of these, how many years will it be before "software engineer" (as understood in 2025) is not a job humans get hired for?

Some notes on resolution:

This market is about humans doing the core work of software engineering - opening tickets, pulling a branch, writing new code, testing it, submitting PRs, etc.
If "software engineer" is still a job title but means something different, the market still resolves.
If software engineers stay employed but their work changes the market still resolves - e.g. if all software engineers switch from being ICs to "AI managers" of some sort.
If this happens before the benchmarks are beaten then the market resolves to 0.
If these benchmarks undergo minor variations, I'll allow the market to resolve based on either the original or the variant (e.g. if a question is added to RE-bench, or a different subset of SWE-bench becomes popular).
If there are still some humans doing software engineering here and there the market still resolves - I'm not really interested in whether some random small companies or government departments will refuse to change.

*The paper lists 0.98 as the average score for testers from METR's professional network, which was their best group of testers. They don't give a variance so I'm adding a little for wiggle room but I think this is reasonably close to "peak human".

This question is managed and resolved by Manifold.

#️ Technology

#Technical AI Timelines

Get

1,000

and

3.00

Related questions

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4)

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

In 2028, Will it be obvious that software engineers aren't being 10x more productive than in 2022?

Related questions

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4)

In 2028, Will it be obvious that software engineers aren't being 10x more productive than in 2022?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules