Turns out that Dall E is very bad at doing so.
Any general-purpose image-generation AI is allowed (Dall E 3, Midjourney, etc). Prompt engineering is allowed. To qualify, the AI and prompt must have a success rate of at least 5 in 20 images when tested.
To be considered a success, an image must contain:
An 8x8 checkered board, with all squares colored correctly.
All chess pieces in their correct starting positions. The chess pieces must be clearly identifiable as their correct type (e.g. A rook must clearly look like a rook)
No extra chess pieces
Images must be generated from a prompt only.
@ProjectVictory lumalabs, used an iterative version of their new model (re-prompted dozens of times until the output was perfect)
@ProjectVictory it would be trivial to create an API that did this automatically, in essence, making a much improved model.
Still, this was cherry picked. The king/queen is still the hardest part.
@Hazel Did you use a fixed series of prompts? If not, how would you make an API that does this automatically?
@MaxMorehead yes, same prompt over and over. If I wanted to make some mana, I could easily build this before the end of the year. It’s easy to repro.
@Hazel oh, would have taken a couple more iterations, to get the queen in the right place. I’m a callable human in the loop lol.
To be clear, you have to verify it's correct?
@Shump How would this resolve if it's possible to build a scaffolded system that generates a chessboard (e.g. calling DALL-E multiple times, using GPT-4o to verify whether the image is correct). Would it change if there's more purpose built parts to the scaffolding (e.g. taking subsets portions of the image and using specific prompts to verify those)?
@Hazel I think if your can build a tool which can select correct chess boards, that's the same thing as a YES resolution. I also think you can't do that.
@Juniper0rg1m Surely the tool has to meet some conditions. If we're allowed to use an arbitrary type of program, you can build an AI that is explicit coded to return an image of a chessboard, or is trained specifically to generate a chessboard only
Even classical image recognition techniques could probably determine if a setup is a legal chessboard with 25% accuracy, given a chessboard of a fixed size.
@TobiasWegener
I think we are getting pretty close with Flux
Problems:
there seems to be a rug, and both sides are white.
The figures seem quite good now.
@ProjectVictory yeah you are right and the strange line in front of the queen, a lot of small mistakes. Intersting how hard it is to see many of them.
I've been trying to cue the model into producing a diagram, since that's presumably easier, but it's not quite getting there. I think the problem is very similar to producing text, if you think of chess pieces as symbols and chess boards as phrases.