2024年02月16日 10時50分ソフトウェア

テキストから物理法則をシミュレートしながら動画を生成するAI「Sora」をOpenAIが発表

大規模言語モデルのGPTシリーズや画像生成AIのDALL-Eを開発するOpenAIが、テキストから最長1分間の動画を生成できるAI「Sora」を発表しました。Soraは「複数のキャラクター、特定の種類のモーション、被写体と背景の正確な詳細を含む複雑なシーン」を「物理世界にどのように存在するのかを理解」した上で動画を生成できるとのことです。

Sora
https://openai.com/sora

Video generation models as world simulators
https://openai.com/research/video-generation-models-as-world-simulators

MidjourneyやStable Diffusion、DALL-Eのように、入力したテキスト(プロンプト)から画像を生成するAIは2022年頃から登場しており、さらに技術が進歩したことで「プロンプトから動画を生成するAI」も登場しています。しかし、こうした動画生成AIは複雑な状況で正確に物理的なシミュレートをするのが難しく、動きの原因と結果を理解できないため、矛盾したムービーが生成されることがよくあります。

たとえば、男の子がクッキーを食べるムービーを生成した場合、なぜか男の子がかじったはずのクッキーにかじった跡が残っていないということが起こります。また、左右や奥行きなどの空間的な情報で混乱が生じてしまうケースもあります。

SoraはSoraは、GPTモデルと同様にTransformerアーキテクチャが用いられたノイズ拡散モデルで、物理シミュレートを行ってムービーを生成するのが特徴。そのため、カメラがグルリと回転したり被写体を回り込んだりといったダイナミックな動きの映像を作ることができるのが特徴です。OpenaAIは「Soraは現実世界を理解してシミュレーションできるモデルの基盤として機能し、AGI(汎用人工知能)を達成するための重要なマイルストーンになると考えています」とコメントしています。

以下は実際にSoraで生成されたムービーの一部。ムービーの直上にある文字列は生成に使われたプロンプトです。

「A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.」

「Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.」

「A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.」

「Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.」

「Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.」

「The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.」

OpenAIはSoraをいつ頃リリースするのかは明らかにしていませんが、「Soraを利用できるようにする前にいくつかの重要な安全措置を講じる予定です。私たちは、誤情報やヘイトコンテンツ、偏見などの分野の専門家で構成されたレッドチームと協力し、モデルを敵対的にテストします」と述べています。さらにSoraによって動画が生成されたかどうかを示す検出分類子などを設定し、メタデータを見ればAIで生成された動画かどうかを判別できるようにするそうです。