Did these guys just straight up solve the generative video problem? The results ...

satvikpendem · on Aug 21, 2023

Video to video is not too special, we've had temporally stable solutions to those for some time now, and they're even in commercially available apps. The real test is true text to video, which is much harder.

To my knowledge, the only open source solution that works well for text to video is Zeroscope v2 XL, and v3 is coming soon. v2 is already on par with RunwayML's Gen-2 while v3 is better.

varelse · on Aug 21, 2023

Runway runs circles around zeroscope. But that shouldn't be a blocker for zeroscope to catch up and/or surpass. Both runway and pika Labs deliver better quality at the moment. Evidence: struggling with all three of them.

Runway outputs the best video quality and options for video length whilst pika delivers better fidelity to an input image as inspiration. All of this subject to change without notice

progbits · on Aug 21, 2023

I'm the car example the wheels don't spin which is interesting.

The original frames have different wheel angles so simple text prompted img2img frame by frame approach would preserve the motion, but at the cost of interframe consistency.

Here you get consistent look of the scene and no rapid transitions, but the wheel motion is gone.

atorodius · on Aug 21, 2023

Being able to condition on a video vs. just text massively simplifies the task. Mainly because you get consistent camera motion and movement in the scene for free!

remixer-dec · on Aug 21, 2023

If I understand correctly, you have to train a custom NLA model for each video segment before actually using it.