Successfully compressed human knowledge/Wikipedia to 114MB with the Hütter Prize to compress 1GB as much as possible

The Hütter Prize, which has been held since 2006 with funding from computer scientist Markus Hütter, aims to encourage research on artificial intelligence (AI), compressing a 1GB file to approximately 114MB. Mr. Saurabh Kumar, who succeeded in , received a prize of 5,187 euros (about 820,000 yen).

500'000€ Prize for Compressing Human Knowledge

The Hütter Prize replaces the vague concept of 'intelligence' with a numerical value of file size, based on the idea that 'proper compression is closely related to intelligent behavior.' , was started with the purpose of encouraging the development of compression programs as smart as possible as a path to AGI (artificial general intelligence).

The prize money was provided by computer scientist Markus Hütter. Initially, the theme was the 100MB text file 'enwik8' and the total prize money was 50,000 euros (about 8 million yen), but from 2020, the 1GB text file 'enwik8' was the subject. 'enwik9' is now eligible, and the total prize money has been increased to 500,000 euros (approximately 80 million yen).

The purpose of enwik9 is to promote the development of universally intelligent compression programs, so the contents of enwik9 are data extracted from the online encyclopedia Wikipedia, which is a universal corpus of data. In addition, the test machine environment for compression is specified as having a single core CPU, less than 10GB of memory, less than 100GB of HDD, and that it can be executed within 50 hours.

Mr. Saurabh Kumar compressed enwik9 using fast cmix, reducing the entire size to approximately 114MB (114,156,155 bytes) and winning the award.In order to receive the prize, please compress enwik9 compared to the previous record. The rate must have improved by at least 1%, and the prize is 5,000 euros for each 1% improvement, so Mr. Kumar, who improved the previous record by 1.04%, will receive 5,187 euros (about 820,000 yen). It was done.

Next, in order to receive an award, the size after compression must be at least less than 113,014,593 bytes.

in Note, Posted by logc_nt