Developed AI 'MusicLM' that automatically composes from text entered by Google



A Google research team has developed an automatic composition AI ` ` MusicLM '' that composes music according to the entered text, like `` Stable Diffusion '' and ` ` DALL E '' that automatically generate images from text.

[2301.11325] MusicLM: Generating Music From Text
https://doi.org/10.48550/arXiv.2301.11325

MusicLM
https://google-research.github.io/seanet/musiclm/examples/

Google created an AI that can generate music from text descriptions, but won't release it | TechCrunch
https://techcrunch.com/2023/01/27/google-created-an-ai-that-can-generate-music-from-text-descriptions-but-wont-release-it/

MusicLM has been trained on a data set consisting of a total of 280,000 hours of music, and is an AI that composes as instructed in text, such as `` impressive saxophone solo and singing voice '' and `` 90's Berlin techno ''. It is



A paper published by Google exemplifies a song actually created by MusicLM. Below is the prompt, 'The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls. ( An arcade game soundtrack, with catchy electric guitar riffs and a fast-paced, upbeat music that is repetitive and catchy, but contains unexpected sounds like cymbal crashes and drum rolls).' .


'A rising synth is playing an arpeggio with a lot of reverb. It is backed by pads, sub bass line and soft drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. It may be playing at a festival during two songs for a buildup. I feel like I'm playing in between)'


'Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive. Sustained electric guitars, high-pitched bongos, and the vocals are laid-back, relaxed, and very expressive.)'


If you simply enter 'relaxing jazz', it looks like this.


Also, by specifying the playback time, you can combine multiple tunes into one song. For example, 'jazz song (0:00-0:15) pop song (0:15-0:30) rock song (0:30-0:45) death metal song (0:45-1:00) rap song (1:00-1:15) string quartet with violins (1:15-1:30) epic movie soundtrack with drums (1:30-1:45) scottish folk song with traditional instruments (1:45-2:00) )” and output the following songs.


It is also possible to compose music from not only text but also images and their descriptions. Below is a song created by inputting the image of Salvador Dali's 'The Persistence of Memory ' and the description of the same work from Encyclopedia Britannica into MusicLM.


It is also possible to add vocals and chorus to songs. However, it is only about 'sounding like vocals and choruses', and the lyrics barely sound like English and have no meaning at all.



Google's research team has not released MusicLM to the public because of the many ethical issues that systems like MusicLM pose. According to the research team, MusicLM tends to incorporate the songs included in the dataset into the generated songs as they are. In one experiment, it seems that about 1% of the songs generated by the system were found to be copied directly from the dataset. “We are aware of the potential misuse risks of creative content associated with this use case,” the research team said. I strongly argue that there is a need for

Although MusicLM itself is not open to the public, the dataset 'MusicCaps' used to evaluate MusicLM is open to the public below.

MusicCaps | Kaggle
https://www.kaggle.com/datasets/googleai/musiccaps

in Software, Posted by log1i_yk