Is scale unnecessary for intelligence (<10B param human-competitive STEM model before 2030)?
➕
Plus
25
Ṁ2029
2030
60%
chance

Resolves yes if before 2030, a neural net with <10B parameters achieves all of: >75% on GPQA, >80% on SWE-bench verified, and >95% on MATH

Arbitrary scaffolding allowed (retrieval over fixed DB is ok), no talking with other AI, no internet access. We'll use whatever tools are available at the time to determine whether such an AI memorized the answers to these datasets; if verbatim memorization obviously happened, the model will be disqualified.

Edit: we'll allow up to 1 minute of time per question.

Possible clarification from creator (AI generated):

  • Model must complete each question within 1 minute of wall-clock time

Get
Ṁ1,000
and
S3.00
Sort by:

New research from Meta describing "Memory Layers" which resemble both attention / a vector DB to keys in the model's latent space.
https://ai.meta.com/research/publications/memory-layers-at-scale/
I think it's quite clear that active params will end up being a smaller and smaller proportion of a model's data (MoEs were only the beginning of this), with most parameters used very sparsely in the same vein as associative memory. My sense is that techniques like these don't count under this question's resolution criteria (since they're trained parameters), but they do point to the same principle.

I didn't explicitly mention wall-clock time, but I said "Arbitrary scaffolding allowed" so unless anyone objects I'll add "Must use below X minutes of wall-clock per question". I am conflicted between 1 minute (upper bound on how long users would be willing to wait) and something higher since the spirit of this question is upper-bound-y.

@JacobPfau Added 1 minute cap. Since we're talking about 10b models on arbitrarily optimized hardware, this isn't much of a constraint. I expect that'l allow >100k tokens/question.

bought Ṁ100 YES

Preregistering my confidence that small models operating closely with large external DBs will turn out to be pretty darn smart.

@AdamK Care to share your reasoning?

@JoeBoyle I don't think I ought to share it. No point giving up so much alpha while prices for downstream markets remain this good.

@AdamK Okay

@JoeBoyle Sorry that I'll have to wait, but here's this to keep me honest: de3be1f4472c9adb4a479b97d140d6615b7189536d8916d52c5426aa0291fd28

I may consider sharing by April or so.
@JacobPfau I'm also happy to bet YES on a "before 2027" version of this market.

@AdamK Yea given qwen/o1 progress I agree that 2027 is possible. I've made a question here https://manifold.markets/JacobPfau/is-scale-unnecessary-for-intelligen?play=true

@AdamK Link is dead mate

bought Ṁ200 NO

The title is a bit misleading, because I think this is theoretically possible but just won't happen before 2030

@SaviorofPlant If you have a better title I will consider editing.

@JacobPfau If you can spare the characters, "before 2030" inside the parentheses clears it up.

Do current larger models reach those scores? Or is improvement AND compression currently necessary?

@KimberlyWilberLIgt Improvement for SWE-bench verified is necessary. The others have been roughly hit by O1. I chose these numbers as being my sense of in domain expert performance

bought Ṁ50 YES

You describe it as the opposite of the title

@IasonKoukas Thanks for catching this @CraigDemel

Pinging @AdamK to make sure your limit orders are in the right direction

Title doesn't match question in text. Should it be "is scale unnecessary"?

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules