Ultra-high precision domestic speech recognition AI 'ReazonSpeech' was released free of charge, so I tried using the transcription function
data:image/s3,"s3://crabby-images/3fb14/3fb142cc1fe06636095fbc9ea4ec8f978ec9e7ff" alt=""
Tokyo-based technology company 'Raazon Holdings' has released ' ReazonSpeech ', one of the largest Japanese speech corpus in Japan with 19,000 hours, for free. At the same time, a transcription service that appeals to performance comparable to the ultra-high-performance speech recognition AI '
Free release of 'ReazonSpeech', a purely domestic Japanese speech recognition model that can be used commercially with ultra-high accuracy - Reason Human Interaction Lab
https://research.reason.jp/news/reasonspeech.html
ReasonSpeech - Reason Human Interaction Lab
https://research.reason.jp/projects/ReasonSpeech/
As a product group of 'ReazonSpeech', Raazon Holdings has a ' ReazonSpeech speech recognition model ' that can be used for transcription, a ' ReazonSpeech corpus creation tool ' that can automatically extract a speech corpus from recorded TV data, etc., a total of 19,000 hours. Three types of high-quality Japanese speech recognition model training corpus ' ReazonSpeech speech corpus ' have been released free of charge. Among them, 'ReazonSpeech speech recognition model' has achieved accuracy comparable to Whisper. Raizon Holdings also released a transcription service using the 'ReazonSpeech speech recognition model' at the same time, so I actually checked the transcription accuracy.
First, access the demo page of the transcription service and click 'Try speech recognition'.
data:image/s3,"s3://crabby-images/7a81b/7a81b1a9e80035b2a35d6e509496fe36b93e21ec" alt=""
When a pop-up requesting permission to use the microphone appears on the upper left of the screen, tap 'Allow'.
data:image/s3,"s3://crabby-images/9add2/9add2590bb151d16ec32c2546528e5254d62231d" alt=""
Then, 'Recording (5 seconds)' is displayed, so read the sentence you want to transcribe for 5 seconds.
data:image/s3,"s3://crabby-images/d0ea1/d0ea1980f9816a1284a7e47351bcc0dc7e567fba" alt=""
Speech recognition ends automatically in 5 seconds, and the transcription result is displayed on the right side of the screen. As a result of reading the beginning of 'I am a
data:image/s3,"s3://crabby-images/55f53/55f53356d8a3027872a761ee08da118c05ca51b5" alt=""
Since the demo page can only transcribe for a maximum of 5 seconds, I will try to transcribe a long sentence using a notebook published on Google's Python execution environment 'Google Colab'. First, click 'Open in Colab' while logged in to Google.
After the notebook opens, click Copy to Drive.
data:image/s3,"s3://crabby-images/e63dd/e63dd96829937ad71b0c2585cdefbdbcf0aaf90d" alt=""
When copying is completed, click the play button in the part marked 'ESPnet2 installation'.
data:image/s3,"s3://crabby-images/ac855/ac8559bb699e6cd9819aaadd0be6d302bdc0da48" alt=""
Wait for a few minutes and OK when a green check mark is displayed.
data:image/s3,"s3://crabby-images/20019/200192f167f97fa194eae0267c0f629908f72c68" alt=""
Next, while logged in to Hugging Face, access the ReasonSpeech
data:image/s3,"s3://crabby-images/037fe/037fe8d158ab431e96735a9ce0eb62b931d6f68a" alt=""
Next, access Hugging Face's
data:image/s3,"s3://crabby-images/f62c1/f62c1fe417101cac05ae1408ddcec72f4a2ce538" alt=""
When the token creation screen is displayed, enter a name of your choice and click 'Generate a token'.
data:image/s3,"s3://crabby-images/bca69/bca69d831b3587d13214119ce2d42046c6231403" alt=""
It is OK if you can create a token like this.
data:image/s3,"s3://crabby-images/5521a/5521ae519f2df124cec03918aed276d348417847" alt=""
Return to the Google Colab screen and click the play button on the part marked 'from huggingface_hub Import notebook_login'.
data:image/s3,"s3://crabby-images/de25f/de25f33601dbabd408b13df95010f983ad16ed7d" alt=""
When the following screen is displayed, enter the created token and click 'Login'.
data:image/s3,"s3://crabby-images/acab5/acab5ff740b8cdf7d295c781c1944292d2cbe058" alt=""
Then click the play button in the part marked 'Import torch'.
data:image/s3,"s3://crabby-images/5bb4a/5bb4a7f2c1e073537df9ec4aac67cee7a5841b75" alt=""
Scroll down when you see a green check mark.
data:image/s3,"s3://crabby-images/4a6aa/4a6aad1aa298be2b0050afe9d08a1ff69d23d341" alt=""
Click the play button in the part marked 'Voice Recognition'.
data:image/s3,"s3://crabby-images/f4843/f484362bb4f9d3c99859043f3933d083ab5d805c" alt=""
When the green check mark is displayed, click the play button of the part marked 'Recording'.
data:image/s3,"s3://crabby-images/e415b/e415b7ac5f9a0b584bf7f6872c2c08137c084a2a" alt=""
Preparations are complete when a green check mark appears.
data:image/s3,"s3://crabby-images/490b8/490b85cbfb24feb69f92f01fcfc5b82628c96aa5" alt=""
Scroll to the bottom until you reach the cell marked 'seconds: 5'. By rewriting this 'seconds: 5' part, you can specify the number of seconds for transcription recognition.
data:image/s3,"s3://crabby-images/efea5/efea53074067c16566b9f11cb2b64436d2bf7516" alt=""
As a trial, I specified the number of seconds for recognition to be 30 seconds and clicked the play button.
data:image/s3,"s3://crabby-images/961ec/961ecab24f581af7eebbed34ca0589447bcb0d9c" alt=""
When asked for permission to use the microphone, click 'Allow'.
data:image/s3,"s3://crabby-images/19afa/19afa2e144aced490904335659b151202b6b529f" alt=""
The screen will display 'Please talk into the microphone for 30 seconds', so continue talking for 30 seconds.
data:image/s3,"s3://crabby-images/52348/523488f33ca1d564a0b74c0c84d8e8514a8ceee9" alt=""
After speaking for 30 seconds, wait for a while, and the transcription result will be displayed at the bottom of the screen. However, despite reading the opening part of ``I am a cat'', ``I am looking at whether it is not yet possible to say that this is what I am, but it is not like this. I think he said that he didn't just say that he was doing it, but I think he said that he wasn't doing it, but I wonder what he meant after this. It was there, so there was nothing to worry about.'
data:image/s3,"s3://crabby-images/e9661/e966106b8a62f939a93549c2e2711efced1b31fb" alt=""
I thought that the reason why the transcription didn't work properly was that the number of seconds for recognition was lengthened . I can't get it, so I remember crying when I was about to be bullied. ' I got a relatively decent transcription result. At the time of writing the article, the demo version of ReasonSpeech does not seem to recognize sentence breaks well.
data:image/s3,"s3://crabby-images/76345/763452af2fa88121046f984d461f2cf5b4a1d6eb" alt=""
Related Posts:
in Review, Software, Web Application, Posted by log1o_hf