Introducing WALT, a diffusion model that generates photorealistic videos from simple text



A research team from Stanford University and Google announced WALT , a diffusion model that generates photorealistic videos from text. Many videos actually generated using 'WALT' have been released.

WALTpdf
https://walt-video-diffusion.github.io/assets/WALTpdf



Photorealistic Video Generation with Diffusion Models
https://walt-video-diffusion.github.io/

'WALT' is a video generation AI based on the deep learning model Transformer announced by Google and others. Mr. Agrim Gupta of the research team mentioned the mechanism of WALT in a post on X (formerly Twitter).



WALT first uses a causal 3D encoder to compress images and videos in a shared latent space .



The team then uses a windowed attention architecture tailored for spatial and temporal co-generative modeling in latent space to improve memory and training efficiency.



This allows us to generate photorealistic and temporally consistent motion from natural language prompts.



In fact, the research team has published many examples of videos generated using WALT. Below is an example.

Video of ``Raccoon wearing a black jacket dancing slowly in front of the pyramid'' made with the AI model ``WALT'' that generates videos from text - YouTube


Video of ``Aerial photography of a beautiful castle surrounded by water'' made with the AI model ``WALT'' that generates videos from text - YouTube


Video of ``Dog wearing VR goggles at dusk'' made with the AI model ``WALT'' that generates videos from text - YouTube


Video of ``astronaut riding a horse'' made with the AI model ``WALT'' that generates videos from text - YouTube


Video of ``Elephant walking on the beach wearing a birthday hat'' made with the AI model ``WALT'' that generates videos from text - YouTube


Other videos published by the research team can be viewed on the following webpage.

Photorealistic Video Generation with Diffusion Models
https://walt-video-diffusion.github.io/samples.html

in Software,   Video, Posted by log1h_ik