Will the technology 'NeRF' that generates 3D models from multiple still images advance deep fake?
Deepfake is a technology that uses AI to create fake images and videos of people, and is controversial in various fields, such as fake pornography videos of celebrities and fake videos disguised as politicians' speeches. increase. Metaphysic.ai , an AI development company, summarizes the challenges faced by such deep fakes and technologies such as `` Neural Radiance Fields '' that have the potential to overcome those challenges.
NeRF: An Eventual Successor for Deepfakes? - Metaphysic.ai
In recent years, AI technology has advanced dramatically, and in February 2022, a study published by the University of Lancaster and others in the UK found that most people cannot distinguish between the face created by AI and the real face, It has been reported that AI-generated faces are more reliable than real faces.
The face generated by AI is indistinguishable from the real face and is more reliable than the real face - GIGAZINE
There are various technologies that use AI, but deep fakes can make real celebrities and politicians appear in fake images and fake videos, so abuse is considered dangerous in various fields. Many of the topics about such deepfakes refer to the two open source packages ' DeepFaceLab (DFL) ' and ' FaceSwap ' that appeared in 2017, but the root of these projects is a lot of mystery called 'deepfakes'. The developer said that it did not deviate much from the code published on GitHub in 2017.
Of course, DFL and FaceSwap have a wide user base and developer community, so improvements have been made, such as the ability to use larger images for training models and the development of mechanisms to automatically remove obstacles. . However, it seems that the improvement in deep fake quality seen in the past three years is mainly due to improvements in data collection and training methods, not innovations in the core part.
At the time of writing the article, training of deep fake software is generally done with a single GPU, and there is a problem that it is difficult to train large-scale data. Due to this bottleneck, it takes a long time to create even a very short video, and when using relatively large images such as 512 x 512 pixels for training, the number of images used for training is limited, and the model is optimized. This poses the problem of hindering generalization. If the model cannot be generalized optimally, the essential features of the data cannot be extracted, or only things that are in line with the original data can be created.
A technology called `` NeRF '' that appeared in 2020 is believed to have the potential to overcome such deep fake problems. NeRF combines images taken from multiple viewpoints in a neural network to generate a 3D model of an object or environment. We can estimate.
Among them, the technology ``Instant NeRF'' announced by NVIDIA in 2022 can synthesize a complex 3D model from just a few images, and the training time that used to take hours to tens of hours is just a few seconds. You can train in. You can see how the 3D model is actually created from the four images by watching the following movie.
NVIDIA Instant NeRF: NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI-YouTube
Instant NeRF's exceptionally fast training speed is due to its ability to discard 'information that does not directly affect content generation.' In other words, Instant NeRF improves training speed by avoiding unnecessary processing as much as possible without considering the information that will be truncated in the final 3D image from the beginning. This mechanism also has the added benefit of making the interface more responsive, as it increases the flexibility and power of the cache.
In addition, the application of ``reproducing human movement with an arbitrary 3D model'' is being studied mainly by the NeRF research community in Asia. and 3D model size ratio can be changed arbitrarily.
AD-NeRF, a technology jointly developed by four Chinese universities, has succeeded in using NeRF to create a video in which a target person makes a speech from ``human image and speech data''.
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis-YouTube
NeRF is expected to have various applications, but like Deep Fake, there are practical limits on the input size of training images, and there are problems such as difficulty in scalability.
Waymo, which develops self-driving cars, uses deep fakes for self-driving simulations, and in order to solve the above problem, combine multiple low-resolution NeRF data to create high-resolution environments and objects 'Block-NeRF We are developing a technology called You can check the 3D models of roads and townscapes actually generated by Block-NeRF in the video below.
In the future, Metaphysic.ai may combine the advantages of generative adversarial networks (GAN) and NeRF, which improve the accuracy of data learning by having two neural networks compete, and develop technology that compensates for each other's shortcomings. pointed out. Since the input image of NeRF does not need to be a photograph of the real world, it can be applied to generate a 3D model based on the image generated by GAN. Several papers have already proposed technologies that combine GAN and NeRF.