2022年09月30日 23時00分サイエンス

テキストを基にする動画生成AI「Phenaki」を匿名の研究者が公開、宇宙飛行士が踊ったりテディベアが泳いだりするサンプル動画も

近年は画像生成AIの「Stable Diffusion」がクオリティの高さを見せつけて話題となっていますが、新たに匿名の研究者がテキストを基に動画を生成するAI「Phenaki」を発表しました。

Phenaki
https://phenaki.video/

Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions | OpenReview
https://openreview.net/forum?id=vOEXS39nOF

「Phenaki」の解説ページを開くと、上部に「Phenaki」で生成したとされる3本のショート動画が表示されていました。

一番左の動画は、「A photorealistic teddy bear is swimming in the ocean at San Francisco(サンフランシスコの海を泳ぐリアルなテディベア)」「The teddy bear goes under water(テディベアは水中に潜る)」「The teddy bear keeps swimming under the water with colorful fishes(色とりどりの魚たちと一緒に泳ぎ続けるテディベア)」「A panda bear is swimming under water(パンダが水中を泳ぐ)」というプロンプトから生成されたものだとのこと。以下の画像をクリックすると、ショート動画を見ることができます。

真ん中の動画は、「A teddy bear diving in the ocean(海に潜るテディベア)」「A teddy bear emerges from the water(水面から顔を出すテディベア)」「A teddy bear walks on the beach(ビーチを歩くテディベア)」「Camera zooms out to the teddy bear in the campfire by the beach(海辺のキャンプファイヤーにいるテディベアにズームアウト)」というプロンプトで生成されたもの。画像をクリックするとショート動画が見られます。

一番右の動画は、「Side view of an astronaut is walking through a puddle on mars(火星の水たまりを歩く宇宙飛行士の横顔)」「The astronaut is dancing on mars(火星でダンスをする宇宙飛行士)」「The astronaut walks his dog on mars(火星で犬を散歩させる宇宙飛行士)」「The astronaut and his dog watch fireworks(花火を見る宇宙飛行士とその犬)」というプロンプトで生成されたとのことで、画像クリックでショート動画をチェック可能。いずれのショート動画も一定のクオリティを維持しており、急すぎる場面転換や画面が崩れるといったこともありません。

「astronaut(宇宙飛行士)」を用いたプロンプトの単語を組み替え、それに応じて変化するショート動画を見ることができる項目も。デフォルトだと「HD Video(HD動画)」「Riding a horse(馬に乗る)」「in the partk at sunrise(夜明けの公園)」といった単語になっています。

「A Cartoon(カートゥーン)」「Riding a dinosaur(恐竜に乗る)」「on mars with earth in the background(火星で地球を背景に)」を選択するとこんな感じ。

「A Cartoon」「Swimming(泳いでいる)」「in the park at sunrise」という組み合わせにしてみるとこんな感じ。残念ながら泳いでいるようには見えない出来栄えです。

また、静止画にプロンプトを与えてショート動画にすることも可能。それぞれの写真に与えられたプロンプトは、左から「Camera zooms quickly into the eye of the cat(カメラが素早く猫の目にズームする)」「A white cat touches the camera with the paw(白猫が前足でカメラに触れる)」「A white cat yawns loudly(白猫が大きなあくびをする)」というもの。

一番左

真ん中

一番右。いずれもプロンプトに従った動画になっていました。

さらに、「Phenaki」で生成したという2分間にわたる動画も公開されています。

以下の画像をクリックすると動画が再生されます。プロンプトは「Lots of traffic in futuristic city.(近未来都市を行き交う多くの人々)」から始まり……

「Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. The astronaut is typing in the keyboard.(近未来都市を行き交う多くの人々。未来都市に到着した異星人の宇宙船。カメラは宇宙船の中に入る。カメラは前進し、青い部屋の中にいる宇宙飛行士を映し出す。宇宙飛行士はキーボードを打っている)」

「The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left. The astronaut leaves the keyboard and walks away. The camera moves beyond the astronaut and looks at the screen. The screen behind the astronaut displays fish swimming in the sea. Crash zoom into the blue fish.(カメラは宇宙飛行士から遠ざかる。宇宙飛行士はキーボードから離れ、左へ歩いていく。宇宙飛行士がキーボードから離れ、歩き出す。カメラは宇宙飛行士の向こう側に移動し、スクリーンを見る。宇宙飛行士の背後にあるスクリーンには、海で泳ぐ魚が映し出されている。カメラが青い魚にクラッシュズームする)」

「We follow the blue fish as it swims in the dark ocean. The camera points up to the sky through the water. The ocean and the coastline of a futuristic city. Crash zoom towards a futuristic skyscraper. The camera zooms into one of the many windows. We are in an office room with empty desks. A lion runs on top of the office desks.(暗い海を泳ぐ青い魚を追う。カメラは水面越しに空を指差す。海、そして近未来都市の海岸線。近未来的な高層ビルに向かってクラッシュ・ズームする。カメラは多くの窓のひとつにズームインする。私たちは空っぽの机があるオフィスルームにいる。ライオンがオフィスの机の上を走っている)」

「The camera zooms into the lion's face, inside the office. Zoom out to the lion wearing a dark suit in an office room. The lion wearing looks at the camera and smiles. The camera zooms out slowly to the skyscraper exterior. Timelapse of sunset in the modern city.(カメラはオフィスの中にいるライオンの顔にズームインする。オフィスルームでダークスーツを着ているライオンにズームアウト。着ているライオンはカメラを見て微笑んでいる。超高層ビルの外観にゆっくりとズームアウトする。近代都市に沈む夕日をタイムラプスで撮影)」

著者によると、「Phenaki」はテキストから圧縮された離散的な動画トークンを生成して、時間的な因果モデルを導入して可変長の動画を作成しているとのことです。なお、論文は査読中のものであり、正式な学会誌に掲載されたものではない点に注意が必要です。

「Phenaki」についてはソーシャルニュースサイトのHacker Newsでも議論されており、「テキストを基にいくつかの静止画像を生成し、次に中間画像を生成してそれらをつなぎ合わせているように見える」というコメントや、「ショート動画にはプロンプトにはない要素も含まれており、多少は人間の手による微調整が入っているのではないか」というコメントが寄せられていました。

Phenaki: A model for generating minutes-long, changing-prompt videos from text | Hacker News
https://news.ycombinator.com/item?id=33025189

この記事のタイトルとURLをコピーする

・関連コンテンツ

2022年09月30日 23時00分00秒 in ソフトウェア, サイエンス, Posted by log1h_ik

You can read the machine translated English article An anonymous researcher publishes a text….