High-speed Japanese full-text search 'Mroonga' explodes the search for past articles dedicated to the editorial department and improves work efficiency
When creating a new article, GIGAZINE checks if there is an article with the same content in the past. When searching with the standard search form from more than 70,000 articles as of August 2021, there is a problem that the time until the result is returned is slow and the work efficiency deteriorates. I did. Therefore, I created a search system to quickly search past articles.
Prepare a new server with a built-in database application that can search at high speed so as not to affect the existing system. This time, I prepared
Mroonga --High-speed full-text search in Japanese with MySQL
https://mroonga.org/ja/
Set and install by referring to the installation procedure.
Enable the universe repository and the security update repository.
$ sudo apt-get install -y -V software-properties-common lsb-release
$ sudo add-apt-repository -y universe
$ sudo add-apt-repository 'deb http://security.ubuntu.com/ubuntu $ (lsb_release --short --codename)-security main restricted'
Added 'ppa: groonga / ppa repository' to the system.
$ sudo add-apt-repository -y ppa: groonga / ppa
$ sudo apt-get update
Install Mroonga for MariaDB.
$ sudo apt-get install -y -V mariadb-server-mroonga
Install groonga-tokenizer-mecab.
$ sudo apt-get install -y -V groonga-tokenizer-mecab
Edit the config file to make the default tokenizer MeCab.
$ sudo vi sudo vi /etc/mysql/mariadb.conf.d/50-server.cnf
In the mysqld item
mroonga_default_tokenizer = TokenMecab
Add .
In addition, after customizing and setting up items such as cache and log according to the environment, check the operation.
$ sudo mysql -u root
> select mroonga_command ('tokenize TokenMecab'Tokyo'');
mroonga_command ('tokenize TokenMecab'Tokyo'')
(Message etc. omitted)
1 row in set (0.023 sec)
In this environment, create a database to store GIGAZINE articles, and prepare a table to add / update article contents and titles as needed. Specify the storage engine for this table as 'mroonga'.
example:
> CREATE TABLE [table name] (item 1, item 2,…) ENGINE = Mroonga;
Set the index of the item you want to search at high speed to the full text index.
example:
> alter table [table name] add fulltext index [index name] (item) COMMENT'tokenizer 'TokenMecab'';
In this way, we created a database to search the contents of past articles at high speed. Let's experiment how much the search time actually differs.
Searching for 'McDonald's' in an existing database took 6.87 seconds.
> select count (*) from article_data where data like '% McDonald's%';
+ ---------- +
| count (*) |
+ ---------- +
1136 |
+ ---------- +
1 row in set (6.87 sec)
When you search for 'McDonald's' with the high-speed search system, the results are displayed in 0.00 seconds and you can feel the power of high-speed search.
> SELECT count (*) FROM article_data WHERE MATCH (data) AGAINST ('+ McDonald's' IN BOOLEAN MODE);
+ ---------- +
| count (*) |
+ ---------- +
| 1086 |
+ ---------- +
1 row in set (0.00 sec)
By preparing a search page using this high-speed search database on the management screen dedicated to the editorial department, the search efficiency has improved dramatically.
At GIGAZINE, we not only create articles, but also develop daily to improve work efficiency in this way.
In addition, GIGAZINE is currently recruiting people who are interested in such contents. We are waiting for your application.
GIGAZINE Employment Information. – There are things that GIGAZINE can do.
https://gigazine.co.jp/
Related Posts: