Apple's AI research team releases AI model 'Depth Pro,' which can generate a 2.25 million pixel 3D depth map in 0.3 seconds using a single image on a standard GPU



Apple's AI research team has released a model called Depth Pro that significantly advances the way machines perceive depth. It can accurately recognize the depth of objects with fine details such as hair and vegetation that are often overlooked by other methods, and can generate high-resolution depth maps in just 0.3 seconds. It is expected to be applicable to systems that estimate depth in real time, such as self-driving cars.

[2410.02073] Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

https://arxiv.org/abs/2410.02073

GitHub - apple/ml-depth-pro: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
https://github.com/apple/ml-depth-pro

Apple releases Depth Pro, an AI model that rewrites the rules of 3D vision | VentureBeat
https://venturebeat.com/ai/apple-releases-depth-pro-an-ai-model-that-rewrites-the-rules-of-3d-vision/

Depth Pro is a model that can create high-resolution depth maps from even a single image. This model excels at capturing fine details like animal fur or birdcage wires, and can generate a 2.25 megapixel (2.25 million pixels) depth map in just 0.3 seconds.



Typically, creating a depth map requires multiple images and additional metadata such as focal length, making it difficult to estimate depth from a single image.

The image below compares the accuracy of Depth Pro with other depth estimation models (Marigold, Depth Anything v2, Metric3D v2). Depth Pro captures details such as animal hair that other models cannot capture.



Below are images of a windmill and a zebra. In the other models, the zebra's body blends into the background, but Depth Pro captures it clearly.



According to the researchers, Depth Pro is unique in that it has the ability to estimate both relative and absolute depth, known as 'metric depth,' which provides a 'real-world measurement' that is essential for applications such as augmented reality (AR), where virtual objects must be placed at precise locations in physical space.

Another feature is that it can generate depth maps from images alone, without requiring metadata from the camera. By applying Depth Pro, it is expected that you can check whether furniture will fit in a room just by pointing your smartphone camera at it, and that self-driving cars will be able to improve safety by estimating depth in real time.

The code and model weights for Depth Pro are publicly available on GitHub, and the research team encourages others to explore the potential of Depth Pro in fields such as robotics, manufacturing, and healthcare.

in Software, Posted by log1p_kr