Elon Musk is talking big: https://x.com/tsarnick/status/1815493761486708993. Says that Grok 3 will come out in December and 'should be' the most powerful AI in the world.
Resolves to YES if Grok 3 is, at the time of its release, plausibly the most powerful AI in the world according to my best judgment. Has to be at least as strong as all models publicly available at the time.
Resolves to NO if it is not the most powerful.
(Resolves NO if no such model is released by 7/23/25, to ensure this doesn't go on forever.)
As of 7/23/2024 Claude Sonnet 3.5 is IMO most powerful AI, but GPT-4o would also resolve to YES based on its position at #1 on Arena and other ways in which some people prefer it. Gemini 1.5 Pro or Advanced would not qualify, but would have counted prior to Sonnet 3.5 and GPT-4o.
(I will not take clarifying questions on my criteria here, it will be my subjective take on 'is this plausibly the best LLM I can access right now.')
Update 2025-05-01 (PST): - Reasoning models are a different class of AI and do not count for the purposes of resolving this market. (AI summary of creator comment)
I do think both interpretations are reasonable, and I could argue both sides. I understand both cases, although I would still be inclined to make the same decision again.
But I have learned that once you make a decision like this, you HAVE TO stick with it, reversing yourself makes things go crazy, even if you decide you made the wrong initial decision, and the only thing you can do after that is turn it over to the mods or stick with what you said.
Given it is 5-0 thumbs up on an accusation that my actions are disingenuous (and I've been outright accused of LYING among other things, seriously WTAF) here despite the market being where it was 2 days before the ruling, honestly, which I REALLY REALLY don't appreciate, I don't need this trouble. I hereby ask the mods to take over this question so I can wash my hands of it, and they can do whatever they decide is best.
Hope everyone's happy now. Enjoy.
I do think both interpretations are reasonable, and I could argue both sides. I understand both cases, although I would still be inclined to make the same decision again.
But I have learned that once you make a decision like this, you HAVE TO stick with it, reversing yourself makes things go crazy, even if you decide you made the wrong initial decision, and the only thing you can do after that is turn it over to the mods or stick with what you said.
Given it is 5-0 thumbs up on an accusation that my actions are disingenuous (and I've been outright accused of LYING among other things, seriously WTAF) here despite the market being where it was 2 days before the ruling, honestly, which I REALLY REALLY don't appreciate, I don't need this trouble. I hereby ask the mods to take over this question so I can wash my hands of it, and they can do whatever they decide is best.
Hope everyone's happy now. Enjoy.
@turtle6agqe I don’t think it’s disingenuous at all. Where’s the dishonesty? I don’t see a personal benefit to clarifying the market in either way
@Bayesian "disingenuous" does not need to be for personal gain.
not candid or sincere, typically by pretending that one knows less about something than one really does.
@turtle6agqe is simply stating that it seems like @ZviMowshowitz is lying about whether he understands the objections being raised by traders on this market.
@Bayesian To be clear, I don't think he's being disingenuous (nor do I think he’s lying), I think he's just not really thinking through the objections, he's owner of a lot of markets, trying to make quick responses to questions, some of those responses are gonna be dumb!
What's important is continuing to iterate on feedback
@Bayesian Oh, the obvious reason why one might lie is because they have some core worldview that would be undermined by telling the truth. When MAGA people are quizzed about world events outside a political context, they often get things right that they will get wrong when polled in a political context.
Are they lying for personal gain when asked in the political context? Not really. They don't really gain anything except the satisfaction of expressing their political affiliation.
As I already stated, I don't think that's what's happening here, but it's a plausible explanation in some circumstances, and a great use of the word "disingenuous"
@ZviMowshowitz what would you think if Grok-3 was the most powerful low-latency model (e.g. better than Sonnet, Gemini 2, o3-mini on low compute) but also clearly less powerful than o1?
it will be my subjective take on 'is this plausibly the best LLM I can access right now
Seems like reasoning models don't count
@JoshYou When I asked the question I did not anticipate reasoning models. I am going to say that reasoning models are a different class of thing, and they don't count for this purpose.
@ZviMowshowitz All models are reasoning models to some extent. It feels pretty artificial to exclude those with a separate chain of thought. For all we know, Grok 3 might use really long chains of thought for tough questions (burning through way more inference compute than its competitors), or it might even be a high-latency reasoning model itself.
On top of that, OpenAI doesn’t seem to care much about GPT-4o anymore. These days, it’s just a crowd-pleaser with a high LLM Arena rating but slightly weaker in benchmarks compared to its version from last May. Google also seems more focused on LLM Arena, as their newest Gemini is actually weaker than the current GPT-4o in benchmarks. It looks like all the big companies are putting their main efforts into AIs with highly variable inference compute, so Grok 3 would be competing in a race that everyone else has mostly abandoned. Calling it the most powerful AI in the world is like letting a young guy compete in the senior Olympics and then crowning him world champion.