OpenAI announces high-performance transcription AI 'Whisper', supports Japanese and can transcribe tongue twisters and lyrics with high accuracy



OpenAI, an AI development organization that has developed high-performance AI such as the image generation AI '

DALL E 2 ' and the sentence generation AI ' GPT-3 ', has newly developed an AI ' Whisper ” was announced. The sample released at the same time as the announcement shows the high performance that can be transcribed without problems even with voice such as `` fast-talking sales talk '' and `` high-tempo song lyrics ''.

Introducing Whisper
https://openai.com/blog/whisper/

GitHub - openai/whisper
https://github.com/openai/whisper

Whisper is a transcription AI trained on a total of 680,000 hours of speech data collected from the Internet. OpenAI's blog post includes audio samples such as `` fast-talking sales talk '', `` K-POP songs '', `` French '', `` unique accent conversation '', and when you click `` REVEAL TRANSCRIPT '' Transcription with Whisper You can check the results.



Whisper is trained with speech data containing one-third of non-English speech, and also supports transcription of languages other than English, such as Japanese, French, and Korean. The accuracy of transcription differs for each language, but if you check the graph below showing the word error rate for each language, the word error rate for Japanese is 6.4%, which is quite high among the languages Whisper supports. can be transcribed. Click the image below to view the entire graph.



OpenAI publishes Whisper's model data and source code on its official GitHub repository . In addition, a demo using Google's Python execution environment 'Colaboratory' is also available, and you can easily run the demo with a Google account.

LibriSpeech.ipynb - Colaboratory
https://colab.research.google.com/github/openai/whisper/blob/master/notebooks/LibriSpeech.ipynb



in Software, Posted by log1o_hf