Will AI agents be able to regularly code small features for us in a year?
Will AI agents be able to regularly code small features for us in a year?
💎
Premium
327
Ṁ350k
Jul 2
95%
chance

I'm thinking of something like https://mentat.ai/, but that actually works.

I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.

I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694

See more discussion in my post:

https://jamesgrugett.com/p/software-automation-will-make-us

Get
Ṁ1,000
and
S3.00


Sort by:
bought Ṁ5,000 YES26d

This looks good to me, stephen gave it two prompts to create this and I think it took less than 10 mins https://github.com/manifoldmarkets/manifold/pull/3588

26d

@ian Looks like we need another prompt to fix the type error, should come in well under 30 mins still, though

bought Ṁ50 NO

@ian Initial comment was more than 30 minutes ago, so this is a failure

Unknown user avatar
Unknown user avatar
Unknown user avatar
bought Ṁ1,000 NO 26d
bought Ṁ2,500 YES at 93% 26d
26d

@CalibratedNeutral oh we stopped paying attention

26d

@CalibratedNeutral I don't know if stephen told it to fix the type error

26d

@ian the key to vibe-coding is to stay just the right amount drunk and not to over do it

bought Ṁ50 YES26d

Claude 4 with github I think does what the mentat.ai thing you linked does

bought Ṁ250 NO1mo

@ian do you have access to chatgpt plus or pro and would be willing to see how codex-1 fares? it's currently only accessible on pro and teams iirc but will be accessible to plus probably before the market closes

bought Ṁ5,000 YES1mo
1mo

GPT 4.1 is awesome for coding.

It's genuinely really good. (mini is ok, nano is dogwater). I have been using it off azure with cursor both as assist and tedious implementation speedrunner - it's one-shot so many instructions that 4o would have a bad time with, and that claude would overthink.

Not tab complete, mostly just asking stuff. Really has come a long way with code

1mo

Crazy how ai agents are regularly building small features for me almost daily and this market is still at 80%

1mo

@DarklyMade is this code peer reviewed?

1mo

@Kire_ of course! The peer review AI looks at it!

1mo

I'd like to conduct some tests using codebuff/cursor. What are acceptable small features in your mind? I have a couple ideas:
- add a button to the comments bottom row that allows users to tip the commenter. Denormalize the tip amount onto the comment and display the total tipped amount on the button.
- Add a delete button for admins/mods that marks a comment as deleted (don't actually delete the comment, just set the deleted flag and hidden flags both) that hides the comment completely from the market.

1mo

@JamesGrugett said the delete comment button for spam fit the bill, I'll try using codebuff to do this soon

1mo

@ian a "view results" button on polls?

1mo

@cthor Also seems reasonable!

@ian I am aware that you work on Manifold, but since you are also the largest YES holder can we maybe agree to let @JamesGrugett do these kinds of evaluations once time comes.

1mo

@CalibratedNeutral That sounds reasonable, although he doesn't work at manifold anymore so I'm not sure if he'll want to put 30 mins in to do this. I was going to film my attempt from scratch

@CalibratedNeutral I was not aware of that. Then maybe a third party (another developer working on Manifold)? The stakes are reasonably high for me, so I really would strongly prefer to have everything as unbiased as possible.

1mo

@CalibratedNeutral We might be able to get @SG or @SirSalty to do it

1mo

@CalibratedNeutral Alternatively, @JamesGrugett could test this question on his new startup, codebuff. He uses codebuff to help develop codebuff

@ian Either option sounds good to me as long as the resolution criteria are followed according to @JamesGrugett's judgement

1mo

@ian how tf did you get the dead head badge?

Comment hidden

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules