Why is the database character limit set to 191 characters?

Looking at databases in the world, there may be a limit of '191 characters or less' on the number of characters of data to be stored. Evan Thaler, CTO of Grouparoo, an open source data synchronization tool, explains why the limit is limited to 191 characters.

Grouparoo Blog: Why do database columns have a character length of 191?


Mr. Thaler first explained that modern database systems can be set to store unlimited characters, and that the limit on the number of characters will improve search speed. For example, if you want to find a user whose email address line is '[email protected]', you have to check whether the data matches one by one without any ingenuity. As the amount of data increases, so does the execution time.

Therefore, if you instruct the database system to create a 'search index', pre-calculation will be performed when writing data, and although the writing time and disk usage will increase, it is possible to perform a search at high speed. This trade-off is valuable because most applications have more reads than writes. And when creating this search index, you can improve the performance by specifying 'how many characters of data will be entered' in advance.

Theoretically, there is no limit to this number of characters, and MySQL , the world's most popular database system, once imposed a limit of 255 characters. According to Mr. Taller, although there are various theories about how it became 255 characters, 'the maximum number that can be represented by 8 bits is efficient', 'ensure compatibility with database systems before MySQL', 'index file is a single old file system' 'I wanted to put it in the page block of.' In any case, the performance of MySQL was tuned on the assumption that the limit was 255 characters, and the later database system also set the limit to 255 characters following MySQL.

After that, with the spread of pictograms, it will be necessary to support new character codes. Initially, MySQL was designed with a character code that represents one character in 3 bytes, so the index has only 767 bytes, which fits 255 characters x 3 bytes. On the other hand, in order to use pictograms, it is necessary to use a new character code called 'utf8mb4' that represents one character in 4 bytes, and as a result, the number of characters is newly restricted to 767/4, which is 191 characters or less. It is said that it became.

It is possible to represent all characters with modern character codes such as utf8mb4, and although it is a past practice to limit the index by the number of characters, many applications limit 191 characters to ensure compatibility. Is used, says Thaler.

in Software, Posted by log1d_ts