I tried to transcribe from an audio file using Google's speech recognition engine



In "Google Cloud Platform" that Google offers as a cloud service, there is a service called " Cloud Speech-to-Text ". This service is a way that AI recognizes an audio file to make it a character, but I actually tried using AI to see how exactly it would make a letter.

Cloud Speech-to-Text - Speech Recognition | Cloud Speech-to-Text API | Google Cloud
https://cloud.google.com/speech-to-text/

First of all, you have registered Google's cloud platform, Google Cloud Platform. If you have not registered yet please register by referring to the beginning of the following article.

How to multiply multiple clients with a Minecraft server on Google's cloud for free - GIGAZINE



Open the console screen when registration is completed. First of all, we activate the API for transcription, so click "API and service" from "navigation menu" in the upper left.



Click "Enable API and Service" at the top of the screen.



If you type "speech" in the part titled "Search for API and service" displayed in the middle of the screen ...



Although the API is narrowed down in real time, enter "speech" and click "Cloud Speech API" among the remaining ones.



Since the API below is already enabled, it is displayed as "management", but if it has not been activated yet, this "management" becomes "enable API" and click it I will. The API preparation is OK with this.



Open the "Navigation Menu" in the upper left again and click "Storage" in the "Storage" item.



Click "Create Bucket" at the top.



Enter the name of the bucket. Everything is okay with this name, but you need to be the only one in the world. I tried to make "mojiokoshi" this time, because I got a warning "I already used it", I avoided it by appropriately adding alphabets like "mojiokoshi-abcde".



Buckets for uploading audio files are ready. Next, prepare an audio file.



Please confirm the document of Google beforehand and how to record. There is MP3 in the encoding which is often used for audio files, but this is an irreversible compression type encoding, which reduces the file size by cutting off the information of the part which human beings are unlikely to recognize. However, for machines that can utilize all the information, the necessary information is cut down, so it is better to avoid using MP3 as much as possible. To summarize easily, in order to improve the recognition accuracy, it is said that the sound should be preserved according to the following specifications when recording.

· Set the sampling frequency to 16 kHz or more
· Use lossless FLAC or LINEAR 16 encoding
· Make the microphone as close to the speaker as possible
· Do not preprocess such as resampling, noise reduction processing, automatic gain control (AGC) , clipping

In this time, I prepared a voice file that reads the article that I wrote the Web server to be autoscaled with Node.js using PaaS "App Engine" of the Google cloud I created earlier and recorded. Audio files are embedded in the bottom, but please note the volume when playing.


Upload the recorded audio file to the bucket created earlier. Because it recorded with linear PCM, the capacity exceeds 30 MB even for audio file of about 6 minutes.



Next I will write a program to make the transcription happen. In the document , examples in various programming languages ​​were listed, but this time we will create a program using "Node.js" and "Cloud Tools" in the same environment as the following article.

I tried to autoscale Web server with Node.js using Google Cloud's PaaS "App Engine" - GIGAZINE



Prepare a folder called "speech-to-text" in the workspace and select "Open in Terminal" from right click.



Enter the following in the terminal. The part in [] needs to be rewritten accordingly.

[code] gcloud ml speech recognize - long - running 'gs: // [name of prepared bucket] / [file name you want to recognize]' - language-code = 'ja - JP' - async [/ code]

That is, for example, if you want to recognize "audio.wav" in "mojiokoshi-abcde"

[code] gcloud ml speech recognize - long - running 'gs: // mojiokoshi - abcde / audio.wav' - language - code = 'en - JP' - async [/ code]

Enter.

If "bad encoding" or "bad sample rate hertz" error occurs,

[code] - encoding = [encoding type] [/ code]

[code] - sample-rate = [sample rate] [/ code]

I will attach an option called. For example, if "audio.wav" is an audio file with a sample rate of 44100hz encoded with linear16, the command to be entered is

[code] gcloud ml speech recognize - long - running 'gs: // mojiokoshi - abcde / audio.wav' - language - code = 'en - JP' - async - encoding = linear 16 - sample - rate = 44100 [/ code]

.

When you enter, the following response will be returned.

[code] Check operation [1234567890123456789] for status.
{
"name": "1234567890123456789"
} [/ code]


To check the status of transcription, enter the following command using this 19 digit number.

[code] gcloud ml speech operations describe 1234567890123456789 [/ code]


However, with the above command, characters are staggered in the terminal and it becomes difficult to understand ......



Add "> output.json" at the end of the command and save the output in the file "output.json".

[code] gcloud ml speech operations describe 1234567890123456789> output.json [/ code]


Looking inside output.json you can see the item "done" at the top. If this is "done": true, transcription has been completed. If it is not completed, information on what percentage is completed in the item "progressPercent" somewhat below is stated.



When finished, convert the json file to an easy-to-read text file. Create a new file named "mojiokoshi.js" in the same folder and make the following contents.

[code] / **
* Read output.json
* /
const data = require ('./ output.json');

/ **
* Fetch the necessary part from the data, format it and output it
* /
const results = data.response.results;
const transcription = results.map (result => result.alternatives [0] .transcript) .join ('\ n');
console.log (transcription); [/ code]


And at the terminal

[code] node mojiokoshi.js> mojiokoshi.txt [/ code]

And save the conversion result in mojiokoshi.txt.



The content of mojiokoshi.txt was as follows.

I tried to write a Web server that autoscales using Google's cloud PS up carrots by Node. JS
The Google app engine provided by Google is a cloud surf classified as PAS bracket platform as a Service. It can be an application that scales according to the number of accesses without doing anything from configuring infrastructure such as server etc. This service The standard environment or June 1998 2 Node JS was supported and we installed a Web server using Node JS soon Although we are using a school chair this time we will use the same procedure on Windows You can install the server First if nodejs is not installed, download the lts version from the official site and install it and prepare an editor for programming Any editor is fine, but this time Visual Studio CODE is prepared You can download by clicking the green button on the left side of the screen Run the file that was loaded and then click Add a workspace folder will be in the lower screen when you start to install the Visual Studio code
Create a folder with the name Node Web Server and click Add. Since the previous folder is added to the Explorer workspace at the top of the left column, open it in the terminal from the menu displayed by right clicking Click on it and the terminal is displayed in the lower right Make sure the terminal is open Node Web Server and enter NPM init Y
NPM NPM is a package management tool for Node JS that is installed at the same time as Node JS Entering the command NPM init will generate NPM 2 initialization action bracket initialization package management village that is invalid Yes I will write an explanation of what kind of pose it is originally in interactive format By attaching the option of bogey it is possible to make package dependency with all default default settings Subsequent setup of server I will install pack express which will make it easy
Entering NPM install Express creates a folder called Node Module under Node Web Server and installs Express and its library to run it. Also simultaneously generated package LOCK JSON is used to record the library version This package LOCK JSON for sharing behavior with other people in code I can understand what version of which library to use I can understand with one shot Mail relay server body behavior change I will change Node Web Server to folder Up. I will organize the file called JS. I refer to Express Hello Worldcode up. I borrowed money from JS
up. Enter GS 2 code Terminal 2 Node App. Enter JS and run update with Node JS Accessing the local host gave Hello world safe when you exit the running application Tatami mat tatami This time I'm deploying the Google app engine to display the simple hello world of a child who does not know strongly about the details of the Express To use the Google app engine Google's cloud service This song on Google Cloud platform who is not registered yet by anyone please register at the beginning of the following article reference please refer to the beginning of the following article Free Cloud on Google Cloud free up on McAfee Server on Google Cloud Multiplayer Multiplayer method Install the tool Cloud tool for managing the Google Cloud platform My OS Choice Download Cloud sdk installation and execute it Cloud tool If you can install all the good applications will set up for playing with the Google Up engine First of all, it is up engine up by which port number should I use? is this rewritten to use the environment variable that contains the process is information that what is moving in any environment for not determined with certainty is the time to move on Engine Jojo Retro Technote Port 2 I was able to stand by now It will become rotten Next is the setting of how to do with Google app Engine It will be up soon I will make an upgrade and I will create a quit Down to the same folder as the up engine Upgrade and hear rice of sperm squid Because you can use the G Club Cloud command thanks to the club tourism installation car you can use the terminal so you can do it at the terminal and press enter key and then press the enter key The confirmation dialog like the one below will appear so please check The application came out When this was completed I entered Siegeliteup blouse and the page of the deployment destination is displayed. If it was displayed as Hello World it was a success but somehow an error was displayed In the inside of the up engine As to whether an error like this is occurring can not be confirmed at the bottom of the dashboard of the up engine As you can see, it is usually only by searching this error message on Google What can I do? Kim did not hit anything I will check it by myself I tried various experiments ROGUYA Package It seems there is a risk of rewriting the script part of the tension as below Autumn Jejun's clothes If you fix the lipstick G Cloud up Deploy in the terminal you wish to reenter It was opened in the G cloud-up browser earlier When using the page properly as a hello world the server that saw it with a piece of axis shows that it automatically wears Access 2 and the fee will be generated according to the scaling of the teeth 5 When I made a mistake why I charged it In order to make sure not to be charged you can set a maximum usage limit of 1 day from the setting tab on the left


Although it is recognition by AI how much it is "readable" sentences were generated.

"When trying out the transcription function of " AI maker "that automatically transcribes the contents of YouTube movies, it was not because it was because I downloaded MP3 files because I could not transcribe it well" , In other words, "The accuracy of recognition on machines will be greatly deteriorated if it is compressed to MP3" has come up, so we converted the WAV file created this time to MP3 once and then convert it back to WAV, Let's see how different it is.

Conversion can be done with any software, but this time we used the open source software Sound eXchange (SoX).

SoX - Sound eXchange download | SourceForge.net
https://sourceforge.net/projects/sox/



sox is a tool available from the command line, you can convert "yomiage.wav" to "yomiage.mp 3" by using the following command, for example.

[code] sox yomiage.wav yomiage.mp3 [/ code]

Again, the power of MP3 compression is powerful, and the audio file which originally was 32.4 MB can be compressed to 2.9 MB or less to 1/10 or less. I re-convert this into a file called "yomiage_mp3.wav" with the following code.

[code] sox yomiage.mp3 yomiage_mp3.wav [/ code]


Again by letting AI recognize it, the following sentences are generated next time.

The Google app engine which Google authors to autoscale using Google 's cloud PS upgrade carrots by Node. JS is a browser classified as platform as a Service Google's engine engine is infrastructure such as a server It seems that you can set up applications that will be able to respond to the number of accesses without doing anything to set up Node JS is used as soon as it corresponds to the standard environment of this Google up engine or the OS of June 2, 1018 I tried installing the Web server Although we are using wax this time, you can install the server in the same procedure on Windows first. If nodejs is not installed first download the lts version from the official website and install it I will prepare an editor for programming Any Erika is okay but this time Visual Prepare Studio CODE Click the green button on the left side of the screen to download it. Run the downloaded file and install Visual Studio code and it will become the screen below. Click Add workspace folder
Create a folder with the name Node Web Server and click Add. Since the previous folder is added to the Explorer workspace at the top of the left column, open it in the terminal from the menu displayed by right clicking Click and you will see the terminal at the bottom right Make sure the terminal is open Node Web Server and enter NPM init Y
NPM NPM is a package management tool for Node JS that is installed at the same time as Node JS Entering the command NPM init can generate NPM 2 initialization action bracket initializer package management invalid package tweet Originally I will write an explanation of what kind of dialogue type, but by attaching the option Y you can make package dependency with default settings as it is, making server setup super easy Install NPM install Express NPM install Express will create a folder named Load Module under Node Web Server and a library to run it will be installed The package LOCK JSON that is also formed at the same time Library bar In order to share actions with others with code to record John this package can understand which version of library etc should be used I can understand with one shot I will change the behavior of the server Node Web Server in a folder I uploaded it with reference to Hello Worldcode of JS file express. I tried to write flower-like behavior in JS
After entering the GS 2 code, enter Terminal 2 Node App. JS and execute update with Node JS. Accessing the local host safely showed Hello world safe When exiting the running application We enter the control key on the terminal at this time We are deploying the Google app engine to display this simple Hello World without going into using the details of the Express this time To use the Google app engine Google cloud This song to the Google Cloud platform which is the service If you are not registered yet by anyone please register at the beginning of the following article please refer to the beginning of the article below Free Google Cloud free Google Cloud Minecraft server on Google's cloud Multiplayer with multiple people How to manage Google Cloud platform Tool cloud tool how Download and run and you install of your OS choices Cloud sdk
Cloud 2 Ruby If you can install a good application you set up for playing with 2 in the Google application First of all, about Apple Pages we created earlier, which port number should be used when running on the up engine clearly and now has wanna what rewriting you doing to it so that the information that what is moving in the environment is to use the environment variable that contains the process all to listen to up engine Jaws supported by the can-chan for not determined the I'm going to set up what it is like with Google app engine on Ebri and get up and quit Creating the same folder as the up engine 2 Apple Pay Make a mole and rewrite it like a movie Thanks to the club tourism installation car In order to use the command g Cloud Since it becomes, it inputs d card up deployment at the terminal and presses the enter key. Since I am writing the confirmation dial like below I checked and the application got out When the application is completed I enter Siegelight up blouse The previous page will be displayed If it is displayed as Hello World it was successful but somehow an error has been displayed For details on what kind of error inside the up engine is happening As you can see on the bottom of the dashboard As you can see most of the error message just searching Google on Google will do something gold or anything did not hit so I checked it by myself Apparently I tried to find a package Rewrite the script portion of the relation as follows It seems there is a risk I've fixed the script of Park Jae Hyun G Cloud up to the traveling terminal Deploy and black again I reloaded the page I opened with G Cloud Up browser I saw a dangerous hello world cottage Kuroda Penguin The server is supposed to wear it as it meets the request and the fee will be charged according to the scaling of the teeth 5 I made a mistake In order to prevent you from being billed for the price when I made it The setting tab on the left It is OK if you set a usage limit for 1 day from


It certainly feels slightly less accurate, but I do not feel like it gets so dramatically worse. Then it is a problem of the AI ​​maker, so let's let Google read the file that AI maker has loaded. I could upload MP3 to AI maker, but for Google, I will convert it to WAV and upload it. The result at the last AI maker has become like the image below ... ...


The transcript result of Google this time is as follows.

I would like to have it There is nothing to do with this I am not alone I am participating in various things I was delayed for a while but there was also a story about doing live women's Nico Nico live broadcast YouTube live and it is going to rain so it seems to be raining so it is possible that there is a possibility of reflection of everyone in the company There is a moment when you shed it on the way but everyone in this distribution Because it can not flow on the net I think that I would like you to be part of it a bit more I think I would like you to be President Sakuma Co., Ltd. Good Smile Company Designer's request I wish you English English I wish Thank you very much for this neighborhood And this time around Designer who received the product design of Shiko-san please do it fun Thank you I am a person who can announce the Tokyo is actually a year since I was allowed to do cloud funding start of last year, and one year passed and the first time is about two months though I was able to go alone It is the final I do not talk about this child This is the figure at the end of the cloud funding though it is because the number of 304055500 yen is 1250 people and what can be done later All shapes To join and that the difference between mother and child is with that one of his friends Nevertheless Cub is


The point that the AI ​​maker can easily use is excellent, but the precision in the transcription from the voice is likely to rise to Google. The fee for this Cloud Speech-to-Text service is free every month for 60 minutes, and over 60 minutes is 0.6 yen every 15 seconds.

in Review,   Software,   Web Service, Posted by log1d_ts