Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did these guys just straight up solve the generative video problem?

The results are better than anything that I've ever seen.

What's the catch? Large processing times? Are the results cherry picked? Or what?

I guess it only works for video to video, but that's still amazing!



Video to video is not too special, we've had temporally stable solutions to those for some time now, and they're even in commercially available apps. The real test is true text to video, which is much harder.

To my knowledge, the only open source solution that works well for text to video is Zeroscope v2 XL, and v3 is coming soon. v2 is already on par with RunwayML's Gen-2 while v3 is better.


Runway runs circles around zeroscope. But that shouldn't be a blocker for zeroscope to catch up and/or surpass. Both runway and pika Labs deliver better quality at the moment. Evidence: struggling with all three of them.

Runway outputs the best video quality and options for video length whilst pika delivers better fidelity to an input image as inspiration. All of this subject to change without notice


I'm the car example the wheels don't spin which is interesting.

The original frames have different wheel angles so simple text prompted img2img frame by frame approach would preserve the motion, but at the cost of interframe consistency.

Here you get consistent look of the scene and no rapid transitions, but the wheel motion is gone.


Being able to condition on a video vs. just text massively simplifies the task. Mainly because you get consistent camera motion and movement in the scene for free!


If I understand correctly, you have to train a custom NLA model for each video segment before actually using it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: