WhisperNER, an open source AI model that can transcribe speech in real time while concealing confidential information
In recent years, many transcription AIs such as '
[2409.08107] WhisperNER: Unified Open Named Entity and Speech Recognition
https://arxiv.org/abs/2409.08107
GitHub - aiola-lab/whisper-ner: Official implementation of 'WhisperNER: Unified Open Named Entity and Speech Recognition'
https://github.com/aiola-lab/whisper-ner
aiOla unveils open source AI transcription that masks sensitive info | VentureBeat
https://venturebeat.com/ai/aiola-unveils-open-source-ai-audio-transcription-model-that-obscures-sensitive-info-in-realtime/
According to aiOla, WhisperNER is a transcription AI built on OpenAI's open source AI model 'Whisper', and is committed to ensuring privacy and complying with data protection regulations. WhisperNER processes audio files and simultaneously applies named entity extraction (NER) to tag or mask sensitive information such as names, phone numbers, and addresses in the output transcription.
Another feature of WhisperNER is that it does not require separate automatic speech recognition (AER) and NER tools, reducing vulnerability to data breaches. 'We designed WhisperNER as an open source tool to improve AI privacy, allowing users to mask sensitive data without the need for additional software,' said Jill Hetz, vice president of research at aiOla.
The WhisperNER demo is publicly available on HuggingFacce.
Whisper-NER (v1) - a Hugging Face Space by aiola
https://huggingface.co/spaces/aiola/whisper-ner-v1
Below is a demo of the voice that reads, 'Hey, my name is John Doe. I live at 100 Maplewood Lane. Phone number is 05066321.' Enter the items you want to hide in the 'Entity Labels' field, 'address,' 'name,' and 'phone-number,' and click 'Submit.'
Then, the transcription 'Hey, my name is 'name'. I live at 'address'. Phone number is 'phone-number'. was output in the 'Transcription and Entities' column with confidential information appropriately masked.
WhisperNER can hide more than just confidential information. In the demo below, we input the following voice message: 'I plan on watching the Juve vs Arsenal game tonight. I hope Yildiz will score.' We then requested 'football-club, football-player, referee' to mask the message.
The output transcript is below. The sentence 'I plan on watching the football-club game tonight. I hope football-player will score.' is output, and you can see that the specified proper nouns are appropriately hidden.
According to aiOla, WhisperNER was trained and evaluated only on English data, and at the time of writing, it was not possible to transcribe Japanese speech.
The WhisperNER source code can be downloaded from the following GitHub page.
GitHub - aiola-lab/whisper-ner: Official implementation of 'WhisperNER: Unified Open Named Entity and Speech Recognition'
Related Posts: