TikTok's parent company ByteDance announces 'MagicVideo-V2', an AI that generates high-quality and faithful videos from text
![](https://i.gzn.jp/img/2024/01/18/magicvideov2-ai-aesthetic-video-generation/00_m.jpg)
A research team at ByteDance, the parent company of TikTok, has announced MagicVideo-V2 , an AI that generates high-quality videos that are faithful to text. Human performance evaluations showed that MagicVideo-V2 outperformed other state-of-the-art AIs that generate videos from text.
[2401.04468] MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
![](https://i.gzn.jp/img/2024/01/18/magicvideov2-ai-aesthetic-video-generation/img-snap4934_m.png)
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
https://magicvideov2.github.io/
MagicVideo-V2 is an AI that generates videos from text developed by ByteDance's research team. The basic structure is as follows. First, based on the input text, a 1024 x 1024 pixel image is generated using the 'T2I (Text to Image)' module. Next, the 'I2V (Image to Video)' module generates 32 600 x 600 pixel images from still images to continuous video, and then the 'V2V (Video to Video)' module generates 1048 x 1048 pixel images. It is said that the resolution will be expanded. Finally, it is explained that the 'Interpolation' module expands the sequence to 94 frames.
![](https://i.gzn.jp/img/2024/01/18/magicvideov2-ai-aesthetic-video-generation/01_m.jpg)
Humans compared videos generated by MagicVideo-V2 with videos created by other cutting-edge video generation AIs such as
![](https://i.gzn.jp/img/2024/01/18/magicvideov2-ai-aesthetic-video-generation/02_m.jpg)
The following is an example of a video actually generated by 'MagicVideo-V2' published by the research team.
Video of 'Girl in a pink dress playing the piano' generated by ByteDance's AI 'MagicVideo-V2' - YouTube
'Selfie panda' video generated by ByteDance's AI 'MagicVideo-V2' - YouTube
Also, on the official page, a comparison of ``Video generated by MagicVideo-V2'', ``Video generated by SVD-XT'', and ``Video generated by Pika 1.0'' with the same prompt is also published. Below is a comparison with the prompt 'A little boy is riding a bike on a park path, the wheels crunching on the gravel'. From left to right, the videos are 'Videos generated by MagicVideo-V2,' 'Videos generated by SVD-XT,' and 'Videos generated by Pika 1.0.' At first glance, the videos generated by MagicVideo-V2 are SVD-XT. It seems to be about the same level as Pika 1.0, and exceeds the accuracy of Pika 1.0.
![](https://i.gzn.jp/img/2024/01/18/magicvideov2-ai-aesthetic-video-generation/03_m.jpg)
This is what the prompt looked like: 'A fox dressed in suit dancing in park.'
Related Posts:
in Software, Web Service, Video, Posted by log1h_ik