Google announces 'D4RT,' an AI that gives artificial intelligence the ability to recognize four dimensions of space and time, helping to develop 'AI that can recognize the world in the same way as humans'

Google DeepMind has developed D4RT , an AI that can recognize 3D space over time based on video. Compared to existing models, D4RT is capable of spatial recognition with higher accuracy and speed, and is expected to be useful in developing AI that can perceive the world in the same way as humans.
D4RT
D4RT: Unified, Fast 4D Scene Reconstruction & Tracking - Google DeepMind
https://deepmind.google/blog/d4rt-teaching-ai-to-see-the-world-in-four-dimensions/
Humans can perceive three-dimensional space based on visual information and predict future situations based on the situation just before and the present. Therefore, in order to give AI the same world perception capabilities as humans, it needs not only the ability to recognize images captured by a camera, but also the ability to perceive four dimensions that combine space and time, by constructing a three-dimensional space based on camera images and understanding movement over time.
D4RT constructs a three-dimensional space based on the images recorded by the camera, and can recognize every pixel of every object in chronological order.
To perceive a 2D scene captured on video, an AI must track every pixel of every object as it moves. 🔍️️
— Google DeepMind (@GoogleDeepMind) January 22, 2026
Capturing this level of geometry and motion requires computationally intensive processes leading to slow and fragmented reconstructions. But D4RT takes a different… pic.twitter.com/LraeC1bWUE
Building a similar 4D recognition system using existing AI models requires combining multiple dedicated AI models, such as depth recognition, motion recognition, and camera angle recognition, which takes time to process. On the other hand, D4RT can perform the necessary processing with a single Transformer-based model, successfully achieving both accuracy and speed.

The graph below compares the 4D recognition performance of various AIs. D4RT demonstrates superior recognition performance compared to existing models. Furthermore, while existing technology takes 10 minutes to process a one-minute video, D4RT can complete the process in approximately 5 seconds. Google DeepMind claims that 'D4RT is 120 times faster than existing technology.'

The D4RT technical paper is available at the following link:
[2512.08924] Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
https://arxiv.org/abs/2512.08924
Related Posts:







