Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.
It does *not* have to be *undetectable* as AI generated, merely "realistic enough".
It must be able to consistently generate realistic videos >=30 seconds long to count.
DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).
Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate
Update 2024-23-12 (PST) (AI summary of creator comment): - Videos must be coherent throughout the full duration - meaning they must maintain consistency with the original prompt for the entire video without shifting between unrelated scenes
Looped scenes do not count
A single example of a successful video is not sufficient for resolution
The video must show continuous action/motion (like "two people walking down a city street having a conversation") for the full duration
Update 2024-24-12 (PST): - The success rate must be at least 66% of DALL-E 2's rate, not a flat rate. (AI summary of creator comment)
Update 2025-05-01 (PST) (AI summary of creator comment): Evidence must be publicly available. Having the model publicly available does not suffice.
If sample videos meeting the criteria are found, the market will be delayed until more information is available.
@vluzko So the model seems capable of longer duration, the limit is in the interface.
As reported by techcrunch:
Veo 2 can create two-minute-plus clips in resolutions up to 4k (4096 x 2160 pixels)...
It’s a theoretical advantage for now, granted. In Google’s experimental video creation tool, VideoFX, where Veo 2 is now exclusively available, videos are capped at 720p and eight seconds in length.
Is this market about what exists, or what is publicly available? Does it make a difference to you if we can find longer sample videos?
@robm Evidence publicly available yes, model publicly available no. If you can find sample videos that meet the criteria I will at least delay resolving the market until more information is available.
I suspect the above quote is technically true bullshit, in that the model will happily spit out frames until it OOMs.
@vluzko I agree it should resolve NO, but that is not the only criterion Veo 2 does not meet.
additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."
This was a stronger reason to bet NO than limited length.
@jgyou yet another maximally entertaining market lol, could have resolved the other way given a few more weeks and/or if goal posts had been placed slightly more permissively
@MrLuke255 I am pretty sure Sora cannot consistently produce coherent 30 second videos even in chunks, but feel free to share examples
@vluzko What do you mean by “coherent”?
I don’t have a subscription and if I bought one, I would be able to generate only a single 30 second video I believe(?). A single example wouldn’t suffice I guess?
So, is this market conditional on someone providing a proof you deem sufficient?
@MrLuke255 Coherent as in it needs to be able to stick to a prompt the whole time - I've seen many examples of videos where the video shifts every few seconds to a different scene, but those don't count. I've also seen videos that are basically a single scene on loop for thirty seconds, those don't count either. You should be able to prompt it with something like "two people walking down a city street having a conversation" and get that for thirty seconds.
Single examples do not count.
If you mean "will I pay money to resolve this market" the answer is no. I wouldn't recommend spending money on Sora to try to resolve this, I haven't seen Sora make anything that would resolve this market.
@vluzko additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."
@vluzko the veo model can create longer clips, but it's limited in the interface currently available.
https://openai.com/sora Looking pretty DALL-E 2 quality 👀 to me (reasonable to wait for the dust to settle re possibility of cherry picking though)
@CalebW The examples on that page meet the quality bar and one of them is >30 seconds long. I think it is very likely that this will resolve the question YES, but I am going to wait to make sure they're not cherry picked.
@vluzko thanks for the info. Can you say more about how you will determine this? E.g., at what approximate percentage will the model need to take a prompt of the difficulty ("Give me a video of a puppy playing with a kitten"), specified to be over 30 seconds if that specification is possible, need to produce a video of the "Tokyo street" video?
And does a YES resolution require third-party access such that you or a trusted person can test cherry-picking?
@Jacy I'm going to go back to the informal evals done with DALL-E 2 when it was released to get a rough sense of what fraction of generated images were reasonable at different levels of prompt complexity. I'll accept a video generator if its success rate (for 30 second videos) is, say, >=66% of DALL-E 2's.
@vluzko thanks! I take that to mean third-party access will be required so you or someone you trust can run that test. Personally, I think the progress in text-to-video is really impressive, but I expect there to be major challenges in getting video of the quality of what's in the company's announcement showcase—similar to what we saw with Pika a few months ago.