TikTok's parent company ByteDance announces 'MagicVideo-V2', an AI that generates high-quality and faithful videos from text



A research team at ByteDance, the parent company of TikTok, has announced MagicVideo-V2 , an AI that generates high-quality videos that are faithful to text. Human performance evaluations showed that MagicVideo-V2 outperformed other state-of-the-art AIs that generate videos from text.

[2401.04468] MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

https://arxiv.org/abs/2401.04468



MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
https://magicvideov2.github.io/

MagicVideo-V2 is an AI that generates videos from text developed by ByteDance's research team. The basic structure is as follows. First, based on the input text, a 1024 x 1024 pixel image is generated using the 'T2I (Text to Image)' module. Next, the 'I2V (Image to Video)' module generates 32 600 x 600 pixel images from still images to continuous video, and then the 'V2V (Video to Video)' module generates 1048 x 1048 pixel images. It is said that the resolution will be expanded. Finally, it is explained that the 'Interpolation' module expands the sequence to 94 frames.



Humans compared videos generated by MagicVideo-V2 with videos created by other cutting-edge video generation AIs such as

Moonvalley , Pika 1.0 , Morph Studio , Gen-2 , and Stable Video Diffusion XT (SVD-XT). The graph showing the results is below. The percentage of people who answered green 'MagicVideo-V2 is better' exceeds the majority for both AIs, indicating that the videos generated by 'MagicVideo-V2' are highly evaluated. .



The following is an example of a video actually generated by 'MagicVideo-V2' published by the research team.

Video of 'Walking Rabbit in Purple Robe' generated by ByteDance's AI 'MagicVideo-V2' - YouTube


Video of 'Girl in a pink dress playing the piano' generated by ByteDance's AI 'MagicVideo-V2' - YouTube


'Selfie panda' video generated by ByteDance's AI 'MagicVideo-V2' - YouTube


Also, on the official page, a comparison of ``Video generated by MagicVideo-V2'', ``Video generated by SVD-XT'', and ``Video generated by Pika 1.0'' with the same prompt is also published. Below is a comparison with the prompt 'A little boy is riding a bike on a park path, the wheels crunching on the gravel'. From left to right, the videos are 'Videos generated by MagicVideo-V2,' 'Videos generated by SVD-XT,' and 'Videos generated by Pika 1.0.' At first glance, the videos generated by MagicVideo-V2 are SVD-XT. It seems to be about the same level as Pika 1.0, and exceeds the accuracy of Pika 1.0.



This is what the prompt looked like: 'A fox dressed in suit dancing in park.'



in Software,   Web Service,   Video, Posted by log1h_ik