Hacker Newsnew | past | comments | ask | show | jobs | submit | sidneyprimas's commentslogin

After much user feedback, we removed the Infinity watermark from the generated videos. Thanks for the feedback. Enjoy!


Interesting! Are you saying you would first want tools to really design your character, and only after start making videos with the character you built? That's interesting.


HeyGen (and our V1 model) literally uses the user on-boarding video in the final output. See here for a demonstration of this (https://toinfinityai.github.io/v2-launch-page/#comparisons). We are not talking about that in this thread. We are trying to solve a quirk of our Diffusion Transformer model (V2 model).

Our V2 model is trained on specific durations of audio (2s, 5s, 10s, etc) as input. So, if give the model a 7s audio clip during inference, it will generate lower quality videos than at 5s or 10s. So, instead, we buffer the audio to the nearest training bucket (10s in this case). We have tried buffering it with a zero array, white noise and just concatenating the input audio (inverted) to the end. The drawback is that the last frame (the one at 7s) has a higher likelihood to fail. We need to solve this.

And, no shade on HeyGen. It's literally what we did before. And their videos look hyper realistic, which is great for B2B content. The drawback is you are always constrained to the hand motions and environment of the on-boarding video, which is more limiting for entertainment content.


i already love you guys more than them bc of how transparent you are. keep it up!!


Thank you! Just want many people to use it. And, it's super interesting to see what type of content people are making with it.


Well then. Tik Tok, and keep ticking to you too.


Nice! These are really good. I wanted them to continue telling their story.


   Through many births
   I have wandered on and on,
   Searching for, but never finding,
   The builder of this house.
is from https://en.wikipedia.org/wiki/Dhammapada (https://buddhasadvice.wordpress.com/2021/02/26/dhammapada-15... and http://www.floweringofgoodness.org/dhammapada-11.php).

    This is the way the world ends
    Not with a bang but a whimper.
is from T.S Eliot, The Hollow Men https://en.wikipedia.org/wiki/The_Hollow_Men (https://interestingliterature.com/2021/02/eliot-this-way-wor...).

First and second pictures are profile pictures that were generated years ago, before openai went on stage. I keep them around for when I need profile pics for templates. The third one has been in my random pictures folder for years.


Can you share an example of this happening? I am curious. We can get static videos if our model doesn't recognize it as a face (e.g. an Apple with a face, or sketches). Here is an example: https://toinfinityai.github.io/v2-launch-page/static/videos/...

I would be curious if you are getting this with more normal images.


I got it with a more normal image which was two frames from a TV show[1]; with "crop face" on, your model finds the face and animates it[2] and with crop face off the picture was static... just tried to reproduce to show you and now instead it's animated both faces.

[1] https://i.pinimg.com/236x/ae/65/d5/ae65d51130d5196187624d52d...

[2] https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...

[3] https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...

But that image was one which both could find a face and gave a static image once.


Ya, it's only a matter of time until very high quality video models will be open sourced.


That's how I see it as well. Very soon, people will assume most videos are AI generated, and the burden of prove will be on people claiming videos are real. We plan to embed some kind of hash to indicate our video is AI generated, but people will be able to get around this. Google/Apple/Samsung seem to be in the best place to solve this: whenever their devices record a real video, they can generate a hash directly in HW for that video, which can be used to verify that it was actually recorded by that phone.

Also, I think it will cost around $100k to train a model at this quality level within 1-2 years. And, will only go down from there. So, the genie is out of the bag.


That makes sense. It isn’t reasonable to expect malicious users to helpfully set the “evil bit,” but you can at least add a little speedbump by hashing your own AI generated content (and the presence of videos that are verifiably AI generated will at least probably catch some particularly lazy/incompetent bad actors, which will destroy their credibility and also be really funny).

In the end though, the incentive and the capability lies in the hands of camera manufacturers. It is unfortunate that video from the pre-AI era have no real reason to have been made verifiable…

Anyway, recordings of politicians saying some pretty heinous things haven’t derailed some of their campaigns anyway, so maybe none of this is really worth worrying about in the first place.


This makes me so happy. Thanks for reporting back! Goal is to reduce creepiness over time.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: