Google DeepMind announces 'Genie 2,' an AI model that can generate playable 3D worlds from a single image
Google DeepMind announced Genie 2 , a basic world model that can generate playable 3D environments from a single input image, on December 4, 2024. The world generated by Genie 2 can be navigated by humans or AI agents using keyboard and mouse operations.
Genie 2: A large-scale foundation world model - Google DeepMind
Genie 2 is an autoregressive latent diffusion model trained on a large video dataset, and exhibits various emergent capabilities, including physics, character animation, and object interactions. It uses images generated by the image generation AI ' Imagen 3 ' as input to create a playable 3D environment, which can be moved around for up to about a minute.
For example, in Imagen 3, the image generated by the prompt 'Screenshot from a third-person open-world exploration game. The player is an adventurer exploring a forest. There is a house with a red door on the left and a house with a blue door on the right. The camera is placed directly behind the player. #photorealistic #immersive' looks like this.
The following movie shows the 3D world generated by Genie 2 from this image being moved by instructing the AI agent SIMA to 'open the blue door.'
Genie 2 responds to actions performed by pressing keys on the keyboard, identifies the character and moves it correctly, and remembers parts of the character that go out of view so that they are rendered correctly when they come back into view.
Below is a video of moving around in a 3D environment generated by Genie 2 using images generated from a text prompt in Imagen 3.
Genie 2 can generate not only first-person and third-person perspectives, but also perspectives such as following behind a car like in a driving game, or a quarter-view looking down at an angle.
Genie 2 also handles physical effects such as gravity, water, smoke, reflections and lighting, allowing for complex character animations, interactions with other NPCs, realistic lighting and reflection effects and more.
In addition, Genie 2 has the ability to generate environments not only from images generated by Imagen 3, but also from real-world photographs and concept art. Google DeepMind says, 'Genie 2 enables researchers to quickly create diverse environments for training AI agents, and enables artists and designers to rapidly prototype their ideas.'
Genie 2 is still in its early stages of research and development, but Google DeepMind believes that this technology will be an important step toward safely training AI agents and advancing research toward general-purpose AI. The Genie 2 research team said they plan to continue working to improve the generality and consistency of the generative capabilities.
Related Posts: