Will the technology 'NeRF' that generates 3D models from multiple still images advance deepfake?

Deepfake is a technology that uses AI to create fake portraits and videos, and is controversial in various fields such as fake porn videos of celebrities and fake videos disguised as politicians' speeches. increase. Metaphysic.ai , a technology media, summarizes the challenges faced by such deepfake and technologies such as 'Neural Radiance Fields ' that have the potential to overcome those challenges.

NeRF: An Eventual Successor for Deepfakes? --Metaphysic.ai

In recent years, AI technology has made great strides, and in a study published by the University of Lancaster in the United Kingdom in February 2022, most people can not distinguish between the face created by AI and the real face, and It has even been reported that AI-made faces are more reliable than real faces.

AI-generated faces are indistinguishable from real faces and are more reliable than real faces-GIGAZINE

There are various technologies using AI, but since deepfake can make real celebrities and politicians appear in fake images and fake videos, misuse is considered dangerous in various fields. Many of the topics about such deepfake refer to the two open source packages ' DeepFaceLab (DFL) ' and ' FaceSwap ' that appeared in 2017, but the basis of these projects is a mystery called 'deepfakes'. It is said that the developer does not deviate so much from the code released on GitHub in 2017.

Of course, DFL and FaceSwap have a broad user base and developer community, so improvements have been made such as the ability to use larger images in training models and the development of mechanisms to automatically remove obstacles. .. However, it seems that the improvement in deepfake quality seen in the past three years is mainly due to the improvement of data collection and training methods, not the renewal of the core part.

At the time of writing the article, deepfake software training is generally done on a single GPU, which makes it difficult to train large amounts of data. Due to this bottleneck, it takes a long time to make a very short video, and when using a relatively large image such as 512 x 512 pixels for training, the number of images used for training is limited, which is optimal for the model. The problem arises that generalization is hindered. If the model cannot be optimally generalized, the essential features of the data cannot be extracted, or only the original data can be created.

It is believed that there is a possibility of overcoming such a deepfake problem with the technology called 'NeRF' that appeared in 2020. NeRF combines images taken from multiple viewpoints in a neural network to generate a 3D model of an object or environment, recognizing shape, texture, transparency, lighting, etc. and synthesizing the missing part of the image. Can be estimated.

Among them, the technology called 'Instant NeRF' announced by NVIDIA in 2022 can synthesize a complex 3D model from just a few images, and the training time that used to take hours to tens of hours is only a few seconds. You can train at. You can see how to actually create a 3D model from 4 images by watching the following movie.

NVIDIA Instant NeRF: NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI-YouTube

Instant NeRF achieves exceptionally fast training speeds due to its ability to discard 'information that does not directly affect content generation.' In other words, Instant NeRF does not consider the information truncated in the final 3D image from the beginning, and improves training speed by avoiding unnecessary processing as much as possible. This mechanism increases the flexibility and capacity of the cache, which also has the added benefit of making the interface more responsive.

In addition, the application of 'recreating human movements with an arbitrary 3D model' is being researched mainly by the NeRF research community in Asia, and the technology 'ST-NeRF' announced by Shanghai Polytechnic Institute in 2021 is a performer. And it is possible to change the size ratio of the 3D model to any one.

duplicate clip --YouTube

A technology called AD-NeRF, which was jointly developed by four universities in China, has succeeded in creating a video in which the target person makes a speech from 'human image and spoken voice data' using NeRF.

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis-YouTube

NeRF is expected to have various applications, but as with deepfake, there are practical restrictions on the input size of training images, and there are also problems such as difficulty in scalability.

Waymo, which develops self-driving cars, uses deep fake to simulate autonomous driving, and in order to solve the above problem, it creates a high-resolution environment and objects by combining multiple low-resolution NeRF data 'Block-NeRF' We are developing a technology called. You can see the 3D model of the roads and streets actually generated by Block-NeRF in the following video.

Block-NeRF --YouTube

Metaphysic.ai may develop technologies that combine the advantages of NeRF with a hostile generation network (GAN) that enhances the accuracy of data learning by competing with two neural networks to make up for each other's shortcomings. Pointed out. Since the input image of NeRF does not have to be a photograph of the real world, applications such as generating a 3D model based on the image generated by GAN can be considered. It is said that several papers have already proposed a technique that combines GAN and NeRF.

in Software,   Science,   Video, Posted by log1h_ik