Will there be realistic AI generated video from natural language descriptions by the start of 2025?

Plus

101

Ṁ30k

resolved Jan 17

Resolved

ALL

Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.

It does *not* have to be *undetectable* as AI generated, merely "realistic enough".

It must be able to consistently generate realistic videos >=30 seconds long to count.

DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).

Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate

Update 2024-23-12 (PST) (AI summary of creator comment): - Videos must be coherent throughout the full duration - meaning they must maintain consistency with the original prompt for the entire video without shifting between unrelated scenes
- Looped scenes do not count
- A single example of a successful video is not sufficient for resolution
- The video must show continuous action/motion (like "two people walking down a city street having a conversation") for the full duration

Update 2024-24-12 (PST): - The success rate must be at least 66% of DALL-E 2's rate, not a flat rate. (AI summary of creator comment)

Update 2025-05-01 (PST) (AI summary of creator comment): Evidence must be publicly available. Having the model publicly available does not suffice.
- If sample videos meeting the criteria are found, the market will be delayed until more information is available.

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

Get

1,000

and

3.00

29 Comments

81 Holders

252 Trades

Sort by:

From what I've seen of Veo 2 it meets all of the criteria except duration, and I think it's very likely that if it could generate 30 seconds videos Google would have posted those. I will give this another week ish just in case they're holding out for some reason, and then resolve NO.

@vluzko So the model seems capable of longer duration, the limit is in the interface.

As reported by techcrunch:

Veo 2 can create two-minute-plus clips in resolutions up to 4k (4096 x 2160 pixels)...
It’s a theoretical advantage for now, granted. In Google’s experimental video creation tool, VideoFX, where Veo 2 is now exclusively available, videos are capped at 720p and eight seconds in length.

Is this market about what exists, or what is publicly available? Does it make a difference to you if we can find longer sample videos?

@robm Evidence publicly available yes, model publicly available no. If you can find sample videos that meet the criteria I will at least delay resolving the market until more information is available.
I suspect the above quote is technically true bullshit, in that the model will happily spit out frames until it OOMs.

@vluzko I agree it should resolve NO, but that is not the only criterion Veo 2 does not meet.

additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."

This was a stronger reason to bet NO than limited length.

I think it's unlikely that Veo's outputs are cherry picked or that it wouldn't pass this, but yes it does need to pass this as well.

I do not see this resolving YES given op's clarifications.

@jgyou yet another maximally entertaining market lol, could have resolved the other way given a few more weeks and/or if goal posts had been placed slightly more permissively

It doesn't say that 30 seconds must be done in one go, so why doesn’t Sora count?

@MrLuke255 I am pretty sure Sora cannot consistently produce coherent 30 second videos even in chunks, but feel free to share examples

@vluzko What do you mean by “coherent”?

I don’t have a subscription and if I bought one, I would be able to generate only a single 30 second video I believe(?). A single example wouldn’t suffice I guess?

So, is this market conditional on someone providing a proof you deem sufficient?

@MrLuke255 Coherent as in it needs to be able to stick to a prompt the whole time - I've seen many examples of videos where the video shifts every few seconds to a different scene, but those don't count. I've also seen videos that are basically a single scene on loop for thirty seconds, those don't count either. You should be able to prompt it with something like "two people walking down a city street having a conversation" and get that for thirty seconds.

Single examples do not count.

If you mean "will I pay money to resolve this market" the answer is no. I wouldn't recommend spending money on Sora to try to resolve this, I haven't seen Sora make anything that would resolve this market.

bought Ṁ250 YES

@VincentLuczkow resolves YES

@Bayesian Can you link an example that you think resolves this market positive?

@ElliotDavies most of these in my opinion: https://x.com/shlomifruchter/status/1868974877904191917 https://www.youtube.com/watch?v=_q4YR_Jzjag https://www.youtube.com/watch?v=dFMjA-9khy8

x.com

@NielsW None of these are >= 30 seconds long and so they do not resolve the market.

@vluzko SORA can compose multiple clips though

@jgyou eg https://x.com/anukaakash/status/1870395272544858341?t=TapL66ESZu6Gs24ZRs5NQw&s=19

bought Ṁ50 YES from 20% to 26%

@vluzko additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."

@vluzko the veo model can create longer clips, but it's limited in the interface currently available.

@jgyou this is not even close to what I would accept in terms of a coherent 30 second video

@Jacy note that I said 66% of DALL-E 2's rate, not flat.

bought Ṁ10 YES

Sora.

bought Ṁ50 NO from 76% to 72%

bought Ṁ50 NO

Still not released. I've thought from the start that Sora is an extremely expensive to operate tech demo aimed at partners in the movie industry. NOT a consumer product.

Furthermore, the example prompt of "puppy playing with kitten" is beyond demonstrated Sora capabilities.

bought Ṁ200 YES

https://openai.com/sora Looking pretty DALL-E 2 quality 👀 to me (reasonable to wait for the dust to settle re possibility of cherry picking though)

bought Ṁ10 NO from 80% to 79%

@CalebW The examples on that page meet the quality bar and one of them is >30 seconds long. I think it is very likely that this will resolve the question YES, but I am going to wait to make sure they're not cherry picked.

@vluzko thanks for the info. Can you say more about how you will determine this? E.g., at what approximate percentage will the model need to take a prompt of the difficulty ("Give me a video of a puppy playing with a kitten"), specified to be over 30 seconds if that specification is possible, need to produce a video of the "Tokyo street" video?

And does a YES resolution require third-party access such that you or a trusted person can test cherry-picking?

@Jacy I'm going to go back to the informal evals done with DALL-E 2 when it was released to get a rough sense of what fraction of generated images were reasonable at different levels of prompt complexity. I'll accept a video generator if its success rate (for 30 second videos) is, say, >=66% of DALL-E 2's.

@vluzko thanks! I take that to mean third-party access will be required so you or someone you trust can run that test. Personally, I think the progress in text-to-video is really impressive, but I expect there to be major challenges in getting video of the quality of what's in the company's announcement showcase—similar to what we saw with Pika a few months ago.

@Jacy Third party access (but not necessarily general/public access) will be required

Related questions

Related questions