[Summary] Java and Unicode character sets

xiaoxiao2021-03-06 49

Most operating systems used today use ASCII code sets represent characters. ASCII initially uses 7-digit code to indicate case, numbers 0 ~ 9, and several control characters, such as NUL and EOT. In Western Europe, users hopes that the code can represent all characters of their national character sets, and the ASCII code set is thus extended to 8 bits, which is added to 128 characters, which is used to represent various Western European languages in English. This extended 8-bit code bit ISO8859-Latin-1 code set. Java needs to use more common solutions to support Asian languages with thousands of ideas. This solution is unicode. Unicode is an ISO standard 16-bit character set, supports 65 536 different characters. Among them, there are 21 000 characters dedicated to expressions in Chinese, Japanese and Korean. The ISO LATIN-1 code set occupies the first 256 characters of Unicode, which is actually a subset of Unicode, just like ASCII is the subset of ISO Latin-1. Internal Java uses 2 bytes to represent each character, using Unicode encoding. If only ASCII or ISO LATIN-1 is used, the encoding is the same. But each character will take more an extra byte. On UNIX, Windows, and Macintosh systems, the default character set is based on 8-bit. When Java reads a character from these systems, the operating system only provides an 8-bit byte, but Java always stores him into a 16-bit data type and pays attention to 16 bits. Java can use the default to perform the corresponding conversions correctly when you need to read or write 16-bit Unicode characters and ASCII characters. Also, if you need some different special processing, you can also find a solution.

转载请注明原文地址:https://www.9cbs.com/read-53979.html

9cbs

New Post(0)