Meta releases AI 'SAM Audio' that can separate only the desired sound from video and audio containing various sounds



Meta has released SAM Audio , an AI for audio separation. SAM Audio can be used to input audio and video, allowing you to perform operations such as extracting specific sounds via text, extracting audio from a specific subject in a video, and extracting sounds that are heard at specific times in a video.

SAM Audio

https://ai.meta.com/samaudio/

Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation | AI at Meta - YouTube


SAM Audio can input not only audio but also video. If you want to extract only the guitar sound from a video recording of a music session, enter the word 'guitar' in the text.



This will allow you to isolate just the guitar sound in the video.



You can also click on a subject in the video to extract only the sound that the subject is making. Even if the video file contains a mixture of human voices and train sounds, you can click on a person and extract only the sound that they are speaking.



You can also extract bird sounds from the entire video by selecting only a portion of the sound in the video that is a bird song. Meta calls this 'Span prompts.'



Meta claims that SAM Audio has higher performance than other voice separation AIs.




SAM Audio model data can be downloaded from the link below.

sam-audio - a facebook Collection
https://huggingface.co/collections/facebook/sam-audio



Instructions on how to use the model are also available at the following link:

GitHub - facebookresearch/sam-audio: The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
https://github.com/facebookresearch/sam-audio



in AI,   Video, Posted by log1o_hf