I tried using the transcription function of 'AI maker' that automatically transcribes the contents of YouTube movies


by Green Chameleon

Everyone can feel free to use AI & AI Platform " AI Maker " has introduced a new character transcription function. I uploaded movies and audio files to transcribe it, of course, just copying the YouTube URL would say that the content would be written automatically in full so I actually tried using it.

I tried to make minutes AI for making text transcripts from images / sound / movie files, YouTube, recordings with "AI maker"
https://qiita.com/2zn01/items/97b4f6dcbbfc4119c282


To use the AI ​​manufacturer's transcription function, first access the following page and click "Log in with Twitter".

AI manufacturer
https://text.aimaker.io/recognize/



Enter Twitter account and password and click "Login".



The page like this will be displayed.



The transcription function corresponds to various languages, and it was possible to choose several kinds such as Australia, Canada, Ghana, Britain among English. This time I will try using the default Japanese.



There are three types of transcribing functions, one of which is "transcription from images, audio, and moving image files".

Up to 10 MB / 5 minutes of files can be transcribed for free, and the corresponding files are as follows.

JPEG / PNG / GIF / WAV / MP3 / WMA / AAC / M4A / FLAC / OGG / MP4 / AVI / FLV / MOV / WMV


The other is "transcribe from YouTube"


The third is "Recording and transcribing" using PC or smartphone microphone.


How precisely can we get back to life? So, first let's try to wake up a YouTube movie that is talking to Amazon's latest smart speaker "Amazon Echo Spot" .

Where to actually operate "Amazon Echo Spot" with voice - YouTube


Paste the URL and click "Transcribe from YouTube".



Click "OK" because you will be asked "Are you waking up transcripts on the designated YouTube?"


Wait for a while. The process of transcribing was said that it will not be interrupted even if the browser is closed, and the time it took to wake up the movie of 1 minute 4 seconds was about 35 seconds.



Then, the conversation in the movie was written as follows. Some adjustment is necessary because some "unknown" is "beautiful" or "consultation" is "competition", but some corrections are necessary, but in general the letter has been correctly transcribed, the speaker changes It is also easy to read that there is a line break when it is separated.



In addition, AI manufacturer's transcription function can not be used once a day.



If you use more than that, you will need to charge a credit from this page . The fee is $ 0.1 per minute (about 12 yen) for audio files and $ 0.1 per image for images. Push "1 dollar" button and click "Purchase".



When purchasing with credit card ......



It was charged like this.



When I tried it with various movies, "I can understand Saigo Den" in 5 minutes ... ...

"Saigo Don" in 1 minute "Satsuma no Yakusenbo" - YouTube


"Kamen rider build which you can understand in 5 minutes" was unsupported by saying "This movie can not be downloaded, so it can not be written."

Kamen rider build which can be known in 5 minutes [official] - YouTube


"Since the animation" Violet Ever Garden "understood in 5 minutes corresponded, I will try to wake up a letter.

Animation "Violet Evergarden" to be seen in 5 minutes Part 1 - YouTube


The result is like this.



Since "I acted together and did battle everyday" is "Sento" and some serifs are missing, it is necessary to check the sentences while listening to the voice, It can be said that it is a precision that makes the transcription work considerably lighter.


Also, it is possible to wake up from images. Osamu Dazai 's "Bishoujo" is like this, the sharpness of the squawto wailing ... ...



It can do quite correctly.



However, if it is a facing photo of a book or if the font has features ...



It seems that the accuracy and precision are falling.



Meanwhile, I can read audio files as well, when I read the MP3 file of Wide FM compatible radio "Hint BLE Radio" completion announcement ... ....



It was a finish that "Is it obviously funny as Japanese?"



The transcription result of the uploaded file is displayed as "transcription list", it was possible to copy text, download CSV, Excel, PDF file and print out. I tried copying the text for confirmation ......



With such a feeling, contents saying "I have not said such a thing at all ..." was transcribed. It was an MP3 file of 5 minutes · 4700 KB, but the content amount being talked was also not enough at all. It seems that sometimes it is not possible to transcribe well because the original speech was fast.

in Review,   Web Service,   Video, Posted by darkhorse_log