Work is underway to standardize ``UUIDv7,'' which does not cause performance problems even if it is set as a database primary key even though it is a UUID.



UUID is an abbreviation for 'Universally Unique Identifier' and is unique in that there are no duplicates anywhere in the world. There are several versions of UUID due to historical circumstances, but Buildkite, a company that provides CI/CD services, explains v7, which is currently undergoing standardization work as of October 2023.

Goodbye to sequential integers, hello UUIDv7!

https://buildkite.com/blog/goodbye-integers-hello-uuids



A UUID is a 128-bit number, and by creating most of it at random, the probability of duplication is as close to zero as possible, making it virtually a unique ID that does not exist anywhere in the world. However, since the numbers are completely random, using them as primary keys would cause performance problems in the database, and it was necessary to devise measures such as having a serial number key for the database separate from the UUID.

For example, in the case of Buildkite products, recent data is accessed more frequently than older data. When the primary key is a UUID, new data is randomly distributed in the index, which means that if you want to retrieve the latest data from a large dataset, you have to scan many database index pages, which reduces the cache hit rate. will decrease. On the other hand, creating primary keys with sequential numbers has the advantage that the most recent data is simply placed at the right end of the index, creating a structure that is easier to cache.

Methods to alleviate these disadvantages have been sought for over 10 years, and a typical solution is to add a timestamp before the random number sequence, such as X (formerly Twitter)'s Snowflake ID and Instagram's Sharding ID . can be mentioned.



In 2022, a new standard proposal for UUID that utilizes these timestamps began to attract attention, and work on standardizing it as UUIDv7 has begun. The structure of UUIDv7 is shown below, where the first 48 bits are a Unix timestamp in milliseconds, then 6 bits are used to indicate the version and variant of the UUID, and the other 74 bits are a random number. It has become.



Using UUIDv7 as a primary key eliminates the need to generate and manage sequential keys for the database, simplifying application logic. Furthermore, since UUIDv7 complies with the standard UUID format, code and libraries for conventional UUIDs can be used as is, and versions can be easily migrated.

Many of the alternatives, such as Snowflake, are 64-bit, so using 128-bit UUIDs does take up a lot of storage space, but this is negligible when considering the storage usage per database row. It's a problem. Considering these benefits, Buildkite stated that they decided to adopt UUIDv7.

in Software, Posted by log1d_ts