An anonymous researcher publishes a text-based video generation AI 'Phenaki', and there are also sample videos in which astronauts dance and teddy bears swim



In recent years, the image generation AI `` Stable Diffusion '' has become a hot topic because of its high quality, but a new anonymous researcher has announced an AI ` ` Phenaki '' that generates videos based on text.

Phenaki

https://phenaki.video/

Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions | OpenReview
https://openreview.net/forum?id=vOEXS39nOF

When I opened the explanation page of 'Phenaki', three short videos that were said to have been generated by 'Phenaki' were displayed at the top.



The leftmost video is 'A photorealistic teddy bear is swimming in the ocean at San Francisco', 'The teddy bear goes under water', and 'The teddy bear It is said that it was generated from the prompt that keeps swimming under the water with colorful fishes (a teddy bear that keeps swimming with colorful fish) and 'A panda bear is swimming under water'. Click the image below to watch a short video.



The videos in the middle are 'A teddy bear diving in the ocean,' 'A teddy bear emerges from the water,' and 'A teddy bear walks on the beach.' )”, generated at the prompt “Camera zooms out to the teddy bear in the campfire by the beach”. Click the image to watch a short video.



The videos on the far right are 'Side view of an astronaut is walking through a puddle on mars' and 'The astronaut is dancing on mars'. 'The astronaut walks his dog on mars (an astronaut who walks a dog on Mars)' 'The astronaut and his dog watch fireworks (an astronaut who sees fireworks and his dog)' , You can check the short video by clicking the image. All short videos maintain a certain quality, and there are no sudden scene changes or screen collapses.



There is also an item that allows you to rearrange the words of the prompt using 'astronaut' and watch a short video that changes accordingly. By default, the words are 'HD Video', 'Riding a horse', and 'in the partk at sunrise'.



When you select 'A Cartoon', 'Riding a dinosaur', or 'on mars with earth in the background', it looks like this.



It looks like this when you combine 'A Cartoon', 'Swimming' and 'in the park at sunrise'. Unfortunately, it is a workmanship that does not look like it is swimming.



It is also possible to give prompts to still images and turn them into short videos. The prompts given to each photo are, from the left, ``Camera zooms quickly into the eye of the cat'' and ``A white cat touches the camera with the paws''. Touch the camera)' 'A white cat yawns loudly (a white cat makes a big yawn)'.



Leftmost



middle



the most right. Both were videos that followed the prompts.



In addition, a 2-minute video generated by 'Phenaki' has also been released.



Click the image below to play the video. The prompt starts with 'Lots of traffic in futuristic city.'



“Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. A lot of people coming and going.An alien spaceship arriving in a future city.The camera enters the spaceship.The camera moves forward and shows an astronaut in a blue room.The astronaut is typing on a keyboard ing)'



'The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left. The astronaut leaves the keyboard and walks away. The camera moves beyond the astronaut and looks at the screen. The screen behind the astronaut displays fish swimming in the sea. Crash zoom into the blue fish. Move to the other side and look at the screen.The screen behind the astronaut shows fish swimming in the ocean.The camera crash zooms to the blue fish.)'



'We follow the blue fish as it swims in the dark ocean. The camera points up to the sky through the water. The ocean and the coastline of a futuristic city. Crash zoom towards a futuristic skyscraper. The camera zooms into one of the many We are in an office room with empty desks. A lion runs on top of the office desks. Crash zooming towards a futuristic skyscraper.The camera zooms in on one of the many windows.We are in an office room with an empty desk.A lion is running over the office desk. )”



'The camera zooms into the lion's face, inside the office. Zoom out to the lion wearing a dark suit in an office room. The lion wearing looks at the camera and smiles. The camera zooms out slowly to the skyscraper exterior. Timelapse of sunset in the modern city. (The camera zooms in on the face of a lion in an office. Zooms out on a lion wearing a dark suit in an office room. The lion is looking at the camera and smiling. Skyscraper. Slowly zooming out to the exterior of the city.A time-lapse shot of the sun setting over a modern city.'



According to the author, “Phenaki” generates compressed discrete video tokens from text and introduces a temporal causal model to create videos of variable length. Please note that the paper is under peer review and has not been published in an official academic journal.

'Phenaki' is also being discussed on the social news site Hacker News, saying, 'It looks like it generates several static images based on text, then generates intermediate images and stitches them together.' Some commented , 'The short video contains elements that the prompt doesn't have, and I suspect there may be some human tweaks.'

Phenaki: A model for generating minutes-long, changing-prompt videos from text | Hacker News
https://news.ycombinator.com/item?id=33025189

in Software,   Science, Posted by log1h_ik