Meta announces video processing model 'DINOv2', possibility of creating immersive VR environment by AI in the future



On April 17, 2023, Meta announced `` DINOv2 '', a new method for training video models. This new method of advanced video understanding through

self-supervised learning is expected to lead to the emergence of generative AI that can construct VR worlds from simple instructions and prompts in the future.

[2304.07193] DINOv2: Learning Robust Visual Features without Supervision
https://doi.org/10.48550/arXiv.2304.07193



DINOv2: State-of-the-art computer vision models with self-supervised learning

https://ai.facebook.com/blog/dino-v2-computer-vision-self-supervised-learning/



Meta Outlines its Latest Image Recognition Advances, Which Could Facilitate its Metaverse Vision | Social Media Today

https://www.socialmediatoday.com/news/Meta-Shares-Latest-Image-Recognition-Developments/647894/



Below is a demonstration video of 'DINOv2' announced by Meta. 'DINOv2' is an extension of Meta's previously announced image model '

DINO ', which can capture dynamic images and generate more accurate segmentations than conventional ones.



According to Meta, the standard approach for visual tasks so far, image-text pre-training, relies on manually written captions, so they are explicitly mentioned in the text. Information that has not been done will be ignored.

For example, if the room in which the chair is placed is labeled 'one oak chair', the information about what kind of room the room is will be missing. The bottleneck may also be the need for human captioning, as there are only a handful of experts who can correctly label microscopic images of cells.

However, because DINOv2 employs self-supervised learning and does not require human captions, the model can fully incorporate background and data that are difficult for humans to explain. In addition, it is said to be useful for building AI that understands what is in the video and what should be placed where according to the situation, and has a great advantage especially in the development of VR content.



In collaboration with WRI Restoration, a non-profit organization that conducts nature restoration projects, Meta succeeded in mapping a forest as large as a continent in units of one tree.



On the DINOv2 demo site published by Meta, it is possible to actually estimate the depth of a photo. When I tried loading a landscape photo as a trial, it was shown that trees, a sea of clouds, and mountains that can be seen beyond them can be captured properly.



It also accurately captures the outlines of baby foxes popping out of their burrows.



Using DINOv2 is expected to improve digital backgrounds for video chats, tagging video content, new types of AR content and visual tools, and more. As a result, it will be possible to develop a VR world generated by AI, and eventually it will not be impossible to build an entire interactive virtual environment, commented overseas media Social Media Today.

Meta has open sourced DINOv2 and made it available to anyone on GitHub as PyTorch code.

GitHub - facebookresearch/dinov2: PyTorch code and models for the DINOv2 self-supervised learning method.

https://github.com/facebookresearch/dinov2

in Software, Posted by log1l_ks