Will LLMs solve the first 15 days of Advent of Code?
Basic
21
Ṁ844
Dec 16
7%
chance

AdventofCode is an annual, daily advent calendar-style series of progressively more difficult coding problems. Using only LLMs, will I be able to solve the first 15 days worth of problems?

The AI must be able to solve both parts of the problem for each of the listed puzzles from December 1st - December 15th, 2024 for this to Resolve YES.

I will not be providing any algorithmic help to the LLM, it needs to come up with the meat of the solution itself. However, if it makes simple mistakes, ones that people who don’t know how to code would catch, I will assist it with those. I will be sharing all successful prompts here, and will not be betting in this market.

Get
Ṁ1,000
and
S3.00
Sort by:

If you wait too long to test, the existing solutions on GitHub might be incorporated into the training data, making it no longer a fair test. The most interesting challenges are day 12 part 2, day 14 part 2, and day 15.

For day 14 part 2, many complained that part 2 is too ambiguous. It would be interesting to see o1's reasoning on this.

For day 15, both Claude 3.5 Sonnet and GPT-4o produce code with identical incorrect results (for me). Could be related to the many "visualizations" in the problem statement that are probably hard to parse for an LLM.

@mattyb Did you wind up doing this?

This looks like it'll be successful, given the leaderboard of AoC this year

bought Ṁ50 YES

I originally bought NO but after trying out o1-preview on a variety of coding challenge questions, I’ve flipped to YES 🌶️

@biased niiice. i’m really excited to spend my December doing a ton of prompt engineering for this market and /mattyb/ai-capabilities-2024-mega-market both.

I’m not a huge AI guy, but by January I’m hoping to be way more knowledgeable and experienced. The perks of making fun markets 😃

How much of an effort will you make to prompt engineer/format the question so the LLM can process it better? If the LLM codes an incorrect solution, will you allow it to try again, and how much feedback will you give it? Output from its program run on the examples? How long will you prompt it to get it to correct its answer? If it makes a small but subtle logic error (for example, ignoring an edge case) will that count as a "simple error"?

@MaxMorehead I’m going to keep trying (without giving it algorithmic help), my prompting and re-prompting. I won’t mark it wrong and move on, if that’s the alternative. I’ll keep tweaking minor things and retrying

Are you going to pick a specific LLM to use, or try out several of them?

@TimothyJohnson5c16 i’m going to try out a few, and ideally one is especially good at those kinds of complex, wordy coding problems. but no, this isn’t about a single product

Does anyone know how well GPT-4 turbo did on 2023 AoC?

@dominic Or a more recent one like 3.5 sonnet if that has been tried

Does it need to succeed on first prompt or can you ask it to correct the code? I think we need clarification on how you will be measuring if the LLM can solve the problem.

@coproduct yea, this is very fair. i’ve done AoC for the last few years, so it’s reasonable that i could corrupt this with hints.

obviously, i’m not going to use an LLM as an IDE here and just have it code a solution i already natively have. however, it the LLM gets 90% of the way to a solution and makes an obvious mistake that someone who doesn’t know how to program could’ve caught (like it doesn’t answer my question, and i ask it a different way) i’ll have it correct that.

i’m not betting here, and will be sharing all successful prompts. i’ll add this to the description.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules