When will LLMs be better at Paradox grand strategy games than the in-game AI for NPCs?

Ṁ655

2031

Invalid contract

Resolution Criteria

This market resolves to the date when Large Language Models (LLMs) are demonstrably better at playing Paradox grand strategy games (such as Europa Universalis, Crusader Kings, Hearts of Iron, Stellaris, or Victoria) than the built-in AI that controls non-player characters (or nations.)

The relevant Paradox games are those current at the time of resolution.

If Paradox integrates LLMs into the AI for NPCs, that counts as admitting that LLMs are better at the task, and this market will resolve to the date the relevant game (or patch, or DLC) is released to the public.

Otherwise, this market will resolve when there is publicly available code I can run, alongside a copy of one of the then-current generation of Paradox GSGs, which consistently plays the game well (in single-player mode.) It doesn't need to achieve world conquest or anything, or even play as well as any given human player would play. But it needs to consistently avoid faceplanting. If it semi-consistently achive success (relative to its starting position), the way even a significantly less-than-median human player can, that's enough to resolve the market.

The level of skill I'm talking about here is one a human player can reach within tens of hours of play time; this isn't meant to be a high bar.

The LLM-based AI can be specialized for playing Paradox games, or one particular game. It can be fine-tunes to the task, or include e.g. specialized tool-calling. I need to be able to run it against a game running on my computer (or in a virtual machine), but the model itself need not be a local one; i.e. it can call the API of a proprietary hosted LLM like Claude or GPT.

As the resolution criteria is somewhat subjective, I will not bet on this market.

Update 2025-06-06 (PST) (AI summary of creator comment): The creator has clarified the allowed input mechanisms for the LLM when evaluating its ability to play the game:
- The LLM should interact with the game using an interface similar to what a human player uses.
- Allowed inputs include sensory information a human would receive, such as the screen and audio.
- Save files are not considered a primary input method for the LLM's ongoing gameplay.
- The creator mentions the style of "Claude Plays Pokemon" as an example of the intended interaction.

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

#LLMs

#Gaming

#Paradox Interactive

Get

1,000

and

3.00

2 Comments

4 Holders

20 Trades

Sort by:

What's the allowed input to the LLM? Screenshots, save files, etc?

I was thinking something in the style of Claude Plays Pokemon. Some harness connecting the LLM to the same basic interface that humans use. Not save files, but the definitely the screen, audio, and so forth—anything a human would receive while normally playing the game is certainly valid.

Invalid contract

Resolution Criteria

Related questions

Related questions