Reason for character set conversion

xiaoxiao2021-03-06 98

The reason for the character set conversion export, the import process is shown in the following figure, from this schematic point of relational to the character set, and the inconsistency of these four character sets is exactly the reason for the character set conversion of Oracle.

Source database character set; user session character set during the export; the user session character set during the import process; the target database character set. During Export and Import, if there is an inconsistency that affects the conversion of the character set conversion, an Oracle character set conversion may occur, namely:

In the Export process, if the source database character set is inconsistent with the Export user session character set, character set conversion will occur, and store the ID number of the Export user session character set in several bytes of the header of the exported binary format DMP file. Data loss may occur during this conversion process. Example 1: If the source database uses zhs16GBK, the Export user session character set uses US7ASCII, because the zhs16GBK is the 8-bit character set, and US7ASCII is the 7-bit character set. During this conversion process, the Chinese characters cannot be found in US7ASCII. Character, so all Chinese characters will be lost to "??" form, that is, the DMP file generated after this conversion has been lost. Example 2: If the source database uses zhs16GBK, and the Export user session character set uses the zhs16cgb231280, but because the zhs16GBK character set is the superchard of the zhs16cgb231280 character set, most of the characters can be converted correctly, only some of the zhs16cgb231280 character set The character becomes "??" form. If the source database uses the zhs16cgb231280 character set, the Export user session uses the ZHS16GBK character set, the conversion process is fully converted. During the transformation of IMPORT to the target database, the character set is transformed in the opposite of the Export process, which is no longer detailed here. In the DMP file exported by Export, an Export user session character set is included. During the Import process, the DMP file character set (ie the Export user session character set) occurs to the IMPORT user session character set. If this conversion process does not complete correctly, the import process of import to the target database cannot be completed.

The correct conversion of the character set is usually, we do not want the transformation of characters during use of Oracle's export and import, but sometimes this conversion is necessary. As we installed the Oracle database, select the zhs16cgb231280 character set, because this character set is a Chinese small character set, for some Chinese characters, it is not possible to say, this needs to be resolved by using the zhs16GBK character set, at this time Conversion of the set.

To ensure that the Oracle character set does not transform or correct conversion, the Oracle character set does not change or correctly transform, it is recommended to check if the source database character set is consistent with the Export user session character set, and the source database character set and the target database character. Whether the set is consistent, the target database character is consistent with the IMPORT user session character set. If it is possible to ensure that these four character sets are consistent, the Oracle character set does not have to conversion in the export, import process.

The following measures are available to check the database character set:

Viewing the initXXX.ora file; view: SQL statement to view: SELECT NAME, Value $ from sys.props $ Where name = 'nls_characterset'. For Export, the Import user session character set, you can also view or modify the NLS_LANG in the registry system in the Windows system. For UNIX systems, you can view or modify it by setting the user's environment variable NLS_LANG. In particular, the Oracle database character set is usually determined when creating. Once the user data is stored, it will be modified again because its data is stored using the character set, and the original data is changed. It is not possible to expand correctly. But if you really want to change the character set, you can implement it by following:

After backing up the database, delete the original data (can be physically backed up, if you use export, please note that the character set does not transition or no loss); using the INTERNAL user update the character set in the Sys.Props $ set: Update sys.props $ set name = 'Dest.charset' where name = 'nls_characterset'; commit; restart the database; recover data. The conversion between the following character sets is feasible:

The character set is a parent set conversion is possible, such as the zhs16cgb231280 to zHS16GBK conversion; and the character set parent class is converted when converting the character set, the partial data is lost. Only the two-word character set containing English character data can also be converted to single-byte character set, such as zhs16GBK (English only) can correctly convert to US7ASCII. Mutual conversion is usually possible between the single-byte character sets of the same coding range. Please note that there is no data loss here, refers to a character set A. After converting to another character set B, you can correctly convert the character set B correctly into a character set A or character set B. Correctly indicate character set A. Data transferred in the middle.

The influence of the character set on the program is represented according to the number of bytes that requires a character that can be divided into single-byte character sets and multi-byte character sets. Among them, the single-byte character set is divided into 7-bit character sets and 8-bit character sets. Single-byte 7-bit encoded character set has US7ASCII, single-byte 8-bit encoded character set is a WE8ISO8859P1 that is specified by ISO 8859-1. Multi-byte coding is also divided into fixed length (length greater than or equal to 2) encoding mode and unfixed length encoding mode. The zhs16GBK, zHS16CGB231280, JA16SJIS, etc. in the multi-byte encoded character set are character sets that represent one character in two bytes, and called double-byte character sets.

An English letter is a character, a Chinese Chinese character is a few characters? We know that a Chinese Chinese character is a double-character character, but it has several characters related to its database character set. If the database character set uses a single-byte US7ASCII, one Chinese Chinese character is two characters; if the database character set uses double-byte character set zHS16GBK, one Chinese Chinese character is a character. This point can be used to use Oracle's function SUBSTR to get proven. Time: Select Substr ('Northeastern University', 1, 2) from DUAL; Statement Execution Returns 'East'.

Time: Select Substr ('Northeastern University', 1, 2) from Dual; Statement Execution Results Returns 'Northeast'.

Select the appropriate database character set Select the database character set to consider the following:

1. Database needs to support what language select the character set for the database, often found that several character sets are suitable for your current language needs, such as Simplified Chinese, there is zhs16GBK and ZHSCGB231280 and other character sets to choose from, which to choose which kind? When selecting a character set, you should take into account the database future system requirements. If you know that the database is to extend to support different languages, choose a wide range of character gatherers is a better idea. 2. The database character combination of the interaction between system resources and applications ensures seamless connection between the operating system and the application. If the selected character set is not a valid character set, the system needs to do character conversion between the two. In this character conversion process, some character lost phenomena may occur. From a character set A to another character set B conversion process, the characters in A must find the equivalent character in B, otherwise it will be replaced by "?". In this sense, if the two character set coding ranges are the same, they can be converted to each other.

The character set conversion process will affect system performance, so ensure that the client and server have the same character set to avoid character set conversion, or improve a certain system performance.

3. The performance of the system requires that different database character sets have certain effects on the performance of the database. In order to get the best database performance, the selected database character set should avoid character conversion, and to select the most efficient encoded efficiency for the desired language. Typically, single-byte character sets have better performance performance than multi-byte character sets, and smaller in space requirements.

4. Other restrictions When selecting a suitable character set for the database, refer to the corresponding version of the ORACLE to check the limitations of Oracle for some character sets. As in Oracle 8.1.5, the following character sets cannot be used: Ja16Eucfixed, zhs16gbkfixed, ja16dbcsfixed, ko16dbcsfixed, zhs16dbcsfixed, ja16sjisfixed, zht32trisfixed.

In summary, correctly understand the conversion process of Oracle character sets, allow us to avoid unnecessary trouble and data loss. Rational use the Oracle character set conversion process can also help us correctly switch from a character set to another to meet our various application needs.

转载请注明原文地址:https://www.9cbs.com/read-116065.html

9cbs

New Post(0)