Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B | Manifold

Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B

Plus

21

Ṁ738

Jul 1

66%

chance

1D

1W

1M

ALL

Question will resolve positive if someone trains a Mamba (https://twitter.com/tri_dao/status/1731728602230890895) language model with <=7.5billion parameters on <=2 trillion tokens that outperforms Llama2-13B on the huggingface open llm leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

Sort by:

https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-4k

Related questions

Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?

+14% 1d69% chance

When will OpenAI release a more capable LLM?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

How many active parameters will the largest Llama 3 have?

What will be true of the first model to cross 1400 on lmarena.ai?

Will anyone train a TokenFormer model at scale before 2026?

Will the next major LLM by OpenAI use a new tokenizer?

Will the Jan 2024 version of the LLM detector "Binoculars" be effective against OpenAI's best model at end 2024?

Will a text model achieve 100% performance on the MMLU in five years?

Related questions

Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?

What will be true of the first model to cross 1400 on lmarena.ai?

When will OpenAI release a more capable LLM?

Will anyone train a TokenFormer model at scale before 2026?

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

Will the next major LLM by OpenAI use a new tokenizer?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

Will the Jan 2024 version of the LLM detector "Binoculars" be effective against OpenAI's best model at end 2024?

How many active parameters will the largest Llama 3 have?

Will a text model achieve 100% performance on the MMLU in five years?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules