May 13, 2023 18:00:00

I tried using an open source recommendation engine 'Metarank' that can easily pick up recommended content according to each individual's taste

Just as there is a four-letter idiom of ten people, ten colors, each person has a different content they want to see. If web services and apps can display appropriate content for each user, there is no doubt that service satisfaction will increase. ' Metarank ' is an open-source, self-hostable recommendation engine that makes it easy to optimize for each individual, so I tried using it to see what it was like.

Metarank - open-source personalized ranker

https://www.metarank.ai/

metarank/metarank: A low code Machine Learning peersonalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine
https://github.com/metarank/metarank

Since Metarank uses Docker to start, install Docker using the method that suits your environment from the link below.

Install Docker Engine | Docker Documentation
https://docs.docker.com/engine/install/

Since I am using CentOS this time, I entered the following command.

[code] sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl start docker[/code]

I would like to start Metarank right away, but first I will prepare the original data for optimization. Metarank optimizes based on four pieces of information: 'content information', 'user information', 'what was displayed to the user', and 'what the user selected'. All of these four pieces of information are handled in the same format of data called 'Event'.

For example, 'content information' is as follows. ''event': 'item'' means that this event is 'addition/update of content information', and 'item': 'item1' specifies the content information ID. Metarank recommendation results will be returned with this ID. Also, it is OK if you enter data that seems to be useful when recommending in ''fields''.

The format for adding or updating user information is as follows. In addition to the event column becoming 'user', the user ID is specified in the 'user' column.

The information 'what was displayed to the user' is used to take into account not only 'what the user chose' but also 'what did the user not choose'. In the example below, when 'user1' searches for the word 'cat', it means that 'item3', 'item1', and 'item2' are displayed in that order. If you enter the score of a ranking system other than Metarank in the 'relevancy' column behind the item, the accuracy will increase. If you use only Metarank, you will enter the Metarank recommendation result as it is.

And we will tell you which content should have been recommended to Metarank at the event 'What did the user choose?' In the example below, user1 seems to have purchased item1.

By entering the above 4 patterns of event information from a blank state, Metarank will steadily improve the accuracy of recommendations. Event information can be loaded in order after startup, but event information can be collected in JSONL format and gzip-compressed, and can be read directly from the local by specifying it at startup.

This time, I will use the official demo data. The demo data consists of a large number of movie information, and each movie data includes 'title', 'rating', 'genre', 'actor', 'director', etc.

Download the event data with the command below.
[code]curl -O -L https://github.com/metarank/metarank/raw/master/src/test/resources/ranklens/events/events.jsonl.gz[/code]

Next, prepare a file 'config.yml' for setting the weight of the model name and 'which part of the information input in the event should be emphasized'. This time, I downloaded the official tuned file with the following code. Even when using your own data

, it will be generated automatically to some extent , so you only have to modify it a little.
[code]curl -O -L https://raw.githubusercontent.com/metarank/metarank/master/src/test/resources/ranklens/config.yml[/code]

Once the event data and configuration file are ready, start Metarank with the code below.
[code]docker run -i -t -p 8080:8080 -v $(pwd):/opt/metarank metarank/metarank:latest standalone --config /opt/metarank/config.yml --data /opt/metarank/ events.jsonl.gz[/code]

Event data is automatically loaded and machine learning is performed, and when the screen below is displayed, preparation is complete.

I will try using Metarank immediately. To get Metarank's recommendation results, you can use an API called 'Rank API'. This time, enter the command described in

the quick start guide . The URL is 'rank' pointing to the Rank API endpoint, and the model name is 'xgboost'. In the 'items' field, specify 'Which contents do you want to rank?' Metarank does not make recommendations by itself, but personalizes based on the results of other search engines and recommendation engines. In this 'items' column, the top 100 science fiction movies are specified.

Metarank returns item id and recommendation score data. For example, this time, the work with id '1580' has the top score of '1.022 ~'.

Consider displaying this response together with the original movie data as shown below.

There are 100 recommended data from Metarank, but 12 are displayed on the screen, so enter the data for 12 in 'Feedback API'. By doing this, Metarank will be able to consider 'works that the user did not choose'.

Through the Feedback API, let's input the data that the user clicked on the work with the work id '1580'.

If you calculate the recommendation score again with the same 100 cases, this time the score is '3.01 ~', which is about three times higher, and you can see that it is personalized according to the user. In addition, it is said that the score of works with similarities such as actors and directors with No. 1580 will improve.

In the initial state, learning results are saved in memory, and the data disappears when Metarank is finished. It is also possible to use such as launching a Metarank instance to ensure scalability.

Related Posts:

May 13, 2023 18:00:00 in Review, Software, Posted by log1d_ts