Unicode has an alphabet that is neither uppercase nor lowercase



The large character set

Unicode aims to include all characters in the world, and includes many unique emojis as well as some unidentified symbols . In addition, software developer Raymond Cheng explains in a Microsoft Developer Blog about the 'alphabet that is neither uppercase nor lowercase' that exists in Unicode.

What has case distinction but is neither uppercase nor lowercase? - The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20241031-00/?p=110443



Unicode contains characters from various languages, including the Latin alphabet, but there are only four characters that have separate types registered in addition to 'uppercase' and 'lowercase'. According to Chen, these are the following four characters: 'DZ', 'DŽ', 'LJ', and 'NJ'. Although they look like two characters when written as 'DZ' and 'NJ', each is a single character specified by a single Unicode code point.



'

DZ ' is the seventh letter of the Hungarian alphabet. The first 10 letters of the Hungarian alphabet look like this:



These characters are written as 'dz', 'dž', 'lj', and 'nj' in lowercase.

Furthermore, these characters have another way of writing them: 'Dz', 'Dž', 'Lj', and 'Nj'. These are classified as 'Titlecase' rather than uppercase or lowercase. Below is the result of searching for 'U+01F2' with

rakko.tools , which can convert and search Unicode characters. In the graphical symbol, it is displayed as 'Dz', and in the 'General Category' which displays 'Uppercase_Letter' and 'Lowercase_Letter', it is displayed as 'Titlecase_Letter'.



Title case refers to a style of writing in which the first letter of a sentence is capitalized. Therefore, when a character such as 'Dz' comes at the beginning of a sentence, it is used in title case rather than in uppercase or lowercase. However, according to comments from Hungarian speakers on Chen's developer blog, it is difficult to say that compound characters such as 'Dz' are actually used in practice today, and it is almost the same as, for example, the English word 'technic' which uses two 'ch' characters as one pronunciation.

'This is one example of how the world is more complicated than you think. Aside from the fact that there are uppercase and lowercase letters in the alphabet, there are other cases that you didn't know about,' Chen said of the intriguing nature of language.

in Web Service, Posted by log1e_dh