I tried to create a server that automatically generates related articles of GIGAZINE by machine learning



Since June, "related content" at the end of the article of GIGAZINE has been replaced by a system using the machine learning server developed by you, from those using Google AdSense.

◆ Relevant reasons for developing self-related article automatic generation system & machine learning server
The reason is simple, automatic creation of related articles by Google AdSense that I used before "Related contentIt was because the accuracy of 'was not very high. Regarding reasons for low precision, I directly asked engineers in charge of Google AdSense to talk with Google Hangouts and asked some reasons, and as a result I made this "I made this on my own If you thought of later things, it would be enough that Ali would be, "he said.

Also, as you can see on Google's site, by properly displaying the relevant articles, you can say that the average recommended recommendation content is 9% pageviews and 10% improvement in staying time Because it becomes possible, it is said that it was a measure that could considerably raise up when considering the entire site.

The following part "related content" at the end of the article is a related article list automatically generated by the machine learning server.


◆ What kind of mechanism is it?
As a rough mechanismGensimDoc2Vec 's features are used to generate related articles. NowFastTextYou may be able to make it easier to use.

The configuration consists of two related articles display server and machine learning server. During machine learning, it consumes a lot of memory and the CPU usage rate is also high, so it's like this.


GIGAZINE lets me learn several times every day because articles are updated everyday. At the time of learning, articles are transferred from the related article display server to the machine learning server by the updated articles.


Acquisition of related articles is as shown below.

1: The related article display server throws the article ID to the machine learning server via the API.
2: Reference the learning data based on the article ID.
3: The related article list result is returned from the machine learning server to the related article display server.
4: Related articles are displayed in GIGAZINE.


The data of the latest related article is not in the learning data. How to publish the latest related articles is predicted using learning data. The concrete method is as shown below.

1: Throw the latest article to the learning server.
2: Convert the latest article into simple learning data.
3: Compare the learning data in the machine learning server and the simple learning data, and pick up an article list similar to the simple learning data from the learning data.
4: The related article list result is returned from the machine learning server to the related article display server.
5: Related articles are displayed in GIGAZINE.


In this way GIGAZINE related articles continue to be generated automatically every day.

◆ What kind of server is running?

Not only one learning server but also spare server is prepared. By preparing a spare, it is a mechanism that can switch even if malfunction occurs in the main learning server.


The main components are as follows.

· CPU 1: Xeon E 5 - 2630 v 4 BOX
· CPU 2: Xeon E 5 - 2630 v 4 BOX
· CPU cooler 1: ETS-N30R-HE
· CPU cooler 2: ETS-N30R-HE
· Memory: CT2K8G4RFD8213
· Motherboard: Z10PE-D8 WS with IPMI
· Power supply: EVGA SuperNOVA 1000 G2 80 + GOLD
· Simple power redundant kit: Phanteks Power Combo
· SSD 1: 850 EVO MZ - 75 E 1 T 0 B / IT
· SSD 2: 850 EVO MZ - 75 E 1 T 0 B / IT
· Case: Enthoo Primo

The main learning server is a self-made PC loaded with two Intel Xeons to secure calculation resources. Considering the maintenance fee of the server and parts fee, this person is easy to manage and cost can be suppressed, so it became a machine learning server like this. It was difficult to secure the redundancy of power supply until a while ago, but now it is "Power ComboThere is a simple redundant power supply kit called "Easy to implement.

Phanteks Innovative Computer Hardware Design


IPMI is normally used for server power monitoring, but power monitoring can not be performed with a simple redundant power supply kit. So, I decided to attach UPS per power supply and monitor the UPS power load.

The preliminary learning server is out of maintenance and DELL's "PowerEdge T310I decided to divert it. Since this server can manage its own power state with IPMI, monitoring of UPS is not necessary.

The current machine learning server runs every day in the server room. The one in front is the main and the other in the back is the spare machine of the machine learning server.


Looking at the back of the machine learning server, you can see that the power supply is redundant.


So, I want to make machine learning and server articles like this! We are looking for editorial staff at GIGAZINE. "Well? It is not a server administrator or a system developer?", It should be "???", but we will recruit "editorial staff".

Because we predict that there will be more questions in the future as having no expert knowledge in creating machine learning articles. Therefore, the purpose of this article is that I do not want you to come to the editorial department by all means to people like you who have been reading this article unexpectedly by looking at such article title. People who made this machine learning server are also waiting for you to come!

For details of the recruitment of the editorial department, it is OK if you refer to the following article.

"Please go to a company that can not fully utilize your capabilities" GIGAZINE recruits recruitment of editorial staff Q & A Conclusion - GIGAZINE


I think that you can grasp the atmosphere in general, so if you feel like "If this is okay", I hit the iron while hot, I feel very happy to apply immediately. As a landmark that you read this article, it is okay if you write in the resume that "You have read the server article that automatically generates related articles of GIGAZINE by machine learning". Thank you.

in Review,   Software,   Hardware,   Pick Up, Posted by darkhorse