If perplexity on Common Crawl is not available for models, I will use other benchmarks as a surrogate. This will inherently be a judgement process. If a model has not been announced by EOY 2025 and no benchmarks have been posted publicly, it will not be counted for the purpose of this market.
"Based on transformers" for the purpose of this question will be anything with multi-headed self-attention that feeds into an MLP.
Update 2025-04-10 (PST) (AI summary of creator comment): Clarification on what constitutes 'based on transformers':
deepseek-style MLA with MoE is considered as based on transformers.
All current models, except for SSMs and LSTMs, are assumed to fall under the category of based on transformers.
The status of RWKV remains open for discussion.
Update 2025-04-12 (PST) (AI summary of creator comment): Hunyuan-turbo{s} Classification Update:
Transformer-SSM Hybrid: Despite being a hybrid, it will be counted as non-transformer.
This establishes that for models with mixed components (e.g., transformer and SSM), the creator will treat them as non-transformer for the purposes of market resolution, unless persuaded otherwise.
Of the top 20 lmarena models, hunyuan-turbo{s} is the only non-transformer one. appears to be transformer-SSM hybrid, which I'll count as non-transformer. Willing to be persuaded on this one. https://cloud.tencent.com/document/product/1729/104753
@ConnorMcCormick oh yeah that's definitely confusing people. We'll, better for us who do understand it :)
@jacksonpolack The API only refreshes the data every 15 seconds, so if you're quick on the draw, it's totally doable.