OpenAI announces 'Sora', an AI that generates videos while simulating physical laws from text
OpenAI, which develops the GPT series of large-scale language models and the DALL-E image generation AI, has announced ' Sora ', an AI that can generate videos of up to 1 minute from text. Sora can generate videos of 'complex scenes with multiple characters, specific types of motion, and precise details of the subject and background' by 'understanding how they exist in the physical world.' is.
Sora
https://openai.com/sora
Video generation models as world simulators
https://openai.com/research/video-generation-models-as-world-simulators
AI that generates images from input text (prompts), such as Midjourney, Stable Diffusion, and DALL-E, has been appearing since around 2022, and as technology advances further, there is also an AI that generates videos from prompts. It is appearing. However, these video generation AIs have difficulty accurately simulating physics in complex situations, and often produce inconsistent movies because they cannot understand the cause and effect of movement.
For example, if you generate a movie in which a boy eats a cookie, for some reason there may not be any bite marks left on the cookie that the boy should have bitten. There are also cases where spatial information such as left/right and depth can cause confusion.
Sora is a noise diffusion model that uses the Transformer architecture like the GPT model, and is characterized by generating a movie by performing physical simulation. As a result, it is characterized by the ability to create videos with dynamic movement, such as the camera rotating around and moving around the subject. OpenaAI commented, ``We believe that Sora will serve as the basis for models that can understand and simulate the real world, and will be an important milestone in achieving AGI (artificial general intelligence).''
Below are some of the movies actually generated with Sora. The string directly above the movie is the prompt used to generate it.
“A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.'
'Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.”
'A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.'
'Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance , and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.”
'Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.'
'The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep road dirt surrounded by pine trees on a steep mountain slope, dust kicks up from it's tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.”
OpenAI hasn't said when it will release Sora, but says, 'We plan to take several important safeguards before making Sora available. We work with a red team of subject matter experts to test our models adversarially.' Furthermore, it will be possible to determine whether a video was generated by AI by setting a detection classifier that indicates whether the video was generated by Sora or not, and by looking at the metadata.
Related Posts: