MIT develops the AI ​​system which extracts the sound of a specific instrument only by clicking on the 'playing' movie

Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (CSIT) has developed an artificial intelligence (AI) system "PixelPlayer" that separates sounds of specific instruments from movies playing music Did. With this system, it is possible to extract only the sound of the instrument or adjust the volume just by clicking on the performer on the movie.

The Sound of Pixels

"PixelPlayer" can identify the instruments shown in the movie at the pixel level and extract the sounds associated with that instrument. Simultaneously analyzing and synchronizing video and music at the same time without manipulation by humans, linking audio with the person playing in the video, just clicking on the movie will make the sound of a specific instrument flow, You can now adjust the volume for each.

The way you actually use PixelPlayer can be seen in the following movie.

Editing Music in Videos Using AI - YouTube

First let's load the movie into PixelPlayer.

If you want to hear only the sound of the guitar out of guitar and violin duo, click the man playing the guitar. Then, only the sound of the guitar is extracted.

Movie playing the song "Super Mario Bros." with Tuba and trumpet

If you click on the man playing the trumpet, the sound of the tuba will become smaller and only the sound of the trumpet will flow.

If you click on the man who blows the tuba, the sound of the tuba will also be added.

PixelPlayer is able to identify the sound of more than 20 instruments by analyzing the performance movie over 60 hours using self-monitoring type deep learning technology.

Moreover, it is also possible to freely adjust the volume of each.

However, when you actually ask, the quality of the sounds extracted depends on the instruments, so there is still room for development. According to research team Hang Zhao, PixelPlayer will be able to identify more instruments with more training data. In the future, it can be expected to be applied as a technique to adjust the sound quality and volume of old performance movies to make it easier to hear, or to make robot hear environmental sound around. PixelPlayer's data set and code will be coming soon.

