Unicode has an alphabet that is neither uppercase nor lowercase
The large character set
What has case distinction but is neither uppercase nor lowercase? - The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20241031-00/?p=110443
Unicode contains characters from various languages, including the Latin alphabet, but there are only four characters that have separate types registered in addition to 'uppercase' and 'lowercase'. According to Chen, these are the following four characters: 'DZ', 'DŽ', 'LJ', and 'NJ'. Although they look like two characters when written as 'DZ' and 'NJ', each is a single character specified by a single Unicode code point.
'
These characters are written as 'dz', 'dž', 'lj', and 'nj' in lowercase.
Furthermore, these characters have another way of writing them: 'Dz', 'Dž', 'Lj', and 'Nj'. These are classified as 'Titlecase' rather than uppercase or lowercase. Below is the result of searching for 'U+01F2' with
Title case refers to a style of writing in which the first letter of a sentence is capitalized. Therefore, when a character such as 'Dz' comes at the beginning of a sentence, it is used in title case rather than in uppercase or lowercase. However, according to comments from Hungarian speakers on Chen's developer blog, it is difficult to say that compound characters such as 'Dz' are actually used in practice today, and it is almost the same as, for example, the English word 'technic' which uses two 'ch' characters as one pronunciation.
'This is one example of how the world is more complicated than you think. Aside from the fact that there are uppercase and lowercase letters in the alphabet, there are other cases that you didn't know about,' Chen said of the intriguing nature of language.
Related Posts:
in Web Service, Posted by log1e_dh