Apr 18, 2023 16:00:00

Meta announces video processing model 'DINOv2', possibility of creating immersive VR environment by AI in the future

On April 17, 2023, Meta announced `` DINOv2 '', a new method for training video models. This new method of advanced video understanding through

self-supervised learning is expected to lead to the emergence of generative AI that can construct VR worlds from simple instructions and prompts in the future.

[2304.07193] DINOv2: Learning Robust Visual Features without Supervision
https://doi.org/10.48550/arXiv.2304.07193

DINOv2: State-of-the-art computer vision models with self-supervised learning

https://ai.facebook.com/blog/dino-v2-computer-vision-self-supervised-learning/

Meta Outlines its Latest Image Recognition Advances, Which Could Facilitate its Metaverse Vision | Social Media Today

https://www.socialmediatoday.com/news/Meta-Shares-Latest-Image-Recognition-Developments/647894/

Below is a demonstration video of 'DINOv2' announced by Meta. 'DINOv2' is an extension of Meta's previously announced image model '

DINO ', which can capture dynamic images and generate more accurate segmentations than conventional ones.

Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards.

More on this new work ➡️ https://t.co/h5exzLJsFt pic.twitter.com/2pdxdTyxC4
— MetaAI (@MetaAI) April 17, 2023

According to Meta, the standard approach for visual tasks so far, image-text pre-training, relies on manually written captions, so they are explicitly mentioned in the text. Information that has not been done will be ignored.

For example, if the room in which the chair is placed is labeled 'one oak chair', the information about what kind of room the room is will be missing. The bottleneck may also be the need for human captioning, as there are only a handful of experts who can correctly label microscopic images of cells.

However, because DINOv2 employs self-supervised learning and does not require human captions, the model can fully incorporate background and data that are difficult for humans to explain. In addition, it is said to be useful for building AI that understands what is in the video and what should be placed where according to the situation, and has a great advantage especially in the development of VR content.

In collaboration with WRI Restoration, a non-profit organization that conducts nature restoration projects, Meta succeeded in mapping a forest as large as a continent in units of one tree.

Models like this will be useful in a wide variety of applications. For example, we recently collaborated with @RestoreForward to use AI to map forests, tree-by-tree, across areas the size of continents.pic.twitter.com/T2we4cqTa4
— MetaAI (@MetaAI) April 17, 2023

On the DINOv2 demo site published by Meta, it is possible to actually estimate the depth of a photo. When I tried loading a landscape photo as a trial, it was shown that trees, a sea of clouds, and mountains that can be seen beyond them can be captured properly.

It also accurately captures the outlines of baby foxes popping out of their burrows.

Using DINOv2 is expected to improve digital backgrounds for video chats, tagging video content, new types of AR content and visual tools, and more. As a result, it will be possible to develop a VR world generated by AI, and eventually it will not be impossible to build an entire interactive virtual environment, commented overseas media Social Media Today.

Meta has open sourced DINOv2 and made it available to anyone on GitHub as PyTorch code.

GitHub - facebookresearch/dinov2: PyTorch code and models for the DINOv2 self-supervised learning method.

https://github.com/facebookresearch/dinov2

Related Posts:

Apr 18, 2023 16:00:00 in Software, Posted by log1l_ks