Sybase ASE character set

xiaoxiao2021-03-06  39

1. Concept 1.1 What is a character set? The character set is a specific collection of characters (including letters, numbers, symbols, and non-print characters, etc.) and the specified internal code. Usually a character set contains characters in alphabets, such as the Latin alphabet being used in English language, and if you want to use a Latin character, you must configure a specific character set-Latin language character set in English Language Concentration. Why is this in English? Because the character set is based on some type of operating system platform and some language set support. A collection of language sets is called a language group, which may contain one or more languages. The local character set is a collection that is encoded on a particular operating system platform based on one or more languages ​​included in a particular language group. In the Client / Server system, support multilingual data processing, but all languages ​​must belong to the same language group. For example, it can be seen from the table below if the character set in the data group 1 in the server can be in French, German, English, and other languages ​​in this group. Japanese, French cannot be stored in this database. Here you should pay attention to a very special character set - it supports international character sets in the world more than 650 languages. Unicode allows you to use different languages ​​of different language groups on the same server. Table 1-1 Adaptive Server supports language and character sets Note: All character sets in the table are set, because the first 128 (decimal) characters of any character set include Latin alphabet, all character sets support English. The characters outside the first 128 characters outside each character set are different, used to support different local language characters. 1.2 What is the order of sorting? Each character set has one or more sort order, and Adaptive Server uses them to store data. Sort order is closely related to a particular language or language group and a specific character set, and different languages ​​are different from the same character, so there is a need for a specific language to be sorted correctly. In addition, the sort order is also closely related to a particular character set, and the sort order of a particular character set is located in a file defined in the sedential order of the character set directory (. "). A list of character sets and their available sort order, as follows: Table 1-2 Available Sort Order Sorting Sequence For - Creating Index - Placing Data into Sort by Index - Specifying Order By clauses Interpretation of Different Type Sort Order: Provides at least one binary sort order for all character sets, this sort order is assigned to the code representing each character ("binary" code) based on the code set ("binary" code) calculation value, suitable for each Before 128 characters and Asian languages ​​for a set. When the character set supports more than one language, the binary sorting order will result in incorrect results, then other sort order should be selected. Dictionary sorting, case sensitive, distinguishing between uppercase and lowercase letters, respectively. The dictionary sort order identifies the various accent forms of the letters and ranks them after the associated non-heponic letters. Dictionary sorting, not case sensitive, distinguished: Sort by dictionary order, uppercase letters and lowercase letters, etc., in case case in ordering results, write alphabetic mix. It is useful to avoid repetition entries in the table. Dictionary sorting, not case sensitive, distinguished, having priority: case in case case, at the same time, at all other conditions, uppercase letters have high priority (ie, uppercase letters first appear first).

When the column specified in the ORDER BY clause matches the clustered index key value of the table, use this sort order may result in a decrease in performance, so it is not recommended to use this sort order unless the deliberate requirements of uppercase letters Small write letters. Dictionary sorting, not case sensitive, does not distinguish between the hemps: the letter to the auditor with the associated alphabet of the unrelated sound, which mix the stress character in the sort. 1.3 What is a character set conversion? In order to maintain the data integrity between the client and the server, the data must be converted between the character set, the purpose is to ensure that "A" is still "a" across machines and character sets, this process is the character set conversion. The way the character set conversion is the conversion of the local character set: Adaptive Server supports the conversion between the local character sets belonging to the same language group. If the server puts a local character set as its default value, the client character set must belong to the same language group, at which time you can browse all the client submitted on the server. As shown below: Figure 1-3 Server-side and client character sets belong to the same language group, the language set and character set used by the server side belong to the server side belong to group 1 (see Table 1-1), then they The realization is the local character set conversion method. Conversion mode in the Unicode system: In the Unicode system, the client character set can be a local character set in any language group because the default character set of the server is UTF-8. As shown below: Figure 1-4 ONICODE system Character Set Conversion Above the data from each client through the server and each client, will be properly converted, regardless of which language group of each client is selected Set, the reason is that the ASE server side selects the default character set UTF-8. Character set conversion type Direct conversion: Supports the conversion between two local character sets in the same language group. For example, Adaptive Server supports conversion between CP437 and CP850 because they belong to the first language group. Unicode conversion: Unicode conversion can be applied to all local character sets, and when converting between two local character sets, Unicode conversion ways uses Unicode as an intermediate character set. For example, when the server default character set CP437 and the client character set CP860 are converted, the CP437 is converted to Unicode, and Unicode is converted to CP860. The Unicode conversion method can be used for both the server default character set UTF-8, and can also be used in the local character set. Unless you use the server default character set UTF-8, you must specifically configure your server to use the Unicode conversion mode (Configuration method, please refer to how the configuration articles configure the conversion type of the character set). How to select a character set will depend on the type of system. In a non-Unicode system, the server and client character set is a local character set, so you can use Adaptive Server to convert directly, but some character sets have not been directly converted, this situation must be converted with Unicode. As shown in the following table: Table 1-5 Character set conversion mode - If all the character sets used in the system are in column 1 of Table 1-5, direct conversion is used. The premise is that all character sets belong to a language group. - If all the character sets used in the system are in column 2 of Table 1-5, or some in column 1, some in column 2, then the server must be configured with Unicode conversion. The premise is that all character sets belong to a language group.

In the Unicode system, if the server defaults to Unicode UTF-8, all conversions will be done between the UTF-8 and the client used, so in the Unicode system, only Unicode conversion can be used. . 2. Configuration 2.1 How to Configure the Conversion Type of Character Set Disabled Character Set Conversion The configuration method is executed in the ISQL environment: 1> sp_configure "Disable Character Set Conversion", 1 2> Go "Disable Character SET Conversion" parameter default configuration value 0, the character set conversion is enabled. How to configure the conversion type of the character set to 1 or 2 of the "Enable Unicode Conversions" parameter. When configured to 1, this setting uses direct conversion or Unicode conversion; when configured 2, this setting uses Unicode conversion; the default configuration value is 0, and direct conversion is used. Perform: 1> sp_configure "enable Unicode Conversions", 1 2> GO 2.2 How to configure the server-side default character set Direct conversion method Direct conversion method refers to the utility provided directly using Sybase, such as using Sqlloc on the UNIX platform Command or edit the SQLLOC.RS script file; Windows platform uses the Server Configuration Graphics Management Tool directly to configure the server-side character set. Conditions with direct conversion methods are: - There is no user data in the server - the damage to the user data in the server is acceptable - absolute determination of data in the server only uses the ASCII-7 character set indirect conversion method indirect conversion method relative to Direct conversion method, you need to complete the configuration through three steps: 1. First export the server's data using the BCP command 2. Select one of the ways in the direct conversion method to configure the server-side character set 3. Re-use The BCP command with the -j parameter puts the data-back server-side configuration server-side character set method SQLLOC- to the UNIX platform execute under the $ SYBASE_OCS / BIN directory: SQLLOC, there will be a graphical interface, in this interface You can choose the language set, character set, and sort order, and it is easy to complete the configuration work.

Edit the SQLLOC.RS script file Copy $ Sybase-aSE / INIT / SAMPLE_RESOURCE_FILES / SQLLOC.RS to $ Sybase_OCS / BIN Directory, press the following black body prompt to edit this file: Sybinit.Release_Directory: / Home / Sybase --- Enter Sybase Product installation path SQLSRV.SERVER_NAME: SYB125 --- Enter the name of the database server SQLSRV.SA_Login: SA SQLSRV.SA_Password: --- Enter the password of sa, if it is empty, what does not fill in SQLSRV.DEFAULT_LANGUAGE: US_ENGLISH - - enter the desired configuration language set sqlsrv.language_install_list: USE_DEFAULT sqlsrv.language_remove_list: USE_DEFAULT sqlsrv.default_characterset: cp850 --- enter the desired configuration of the character set sqlsrv.characterset_install_list: USE_DEFAULT sqlsrv.characterset_remove_list: USE_DEFAULT sqlsrv.sort_order: binary-- - enter the desired configuration sort order # An example sqlloc resource file ... # sybinit.release_directory: USE_DEFAULT # sqlsrv.server_name: PUT_YOUR_SERVER_NAME_HERE # sqlsrv.sa_login: sa # sqlsrv.sa_password: # sqlsrv.default_language: french # sqlsrv.language_install_list : spanish, german # sqlsrv.language_remove_list: USE_DEFAULT # sqlsrv.default_characterset: cp437 # sqlsrv.characterset_install_list: mac, cp850 # sqlsrv.characterset_remove_list: USE_DEFAULT # Sqlsrv.sort_order: Dictionary saves the modified SQLLOC.RS script file, perform the following command: Sqllocres -r Sqlloc.RS Note that the message appears on the screen, if there is no abnormality, complete the configuration work. "Server Configuration" graphics management tool - Suitable for Windows Platform "Server Configuration" management tools provide an easy-to-operate graphical management platform, which is easy to complete the configuration of the character set, here, according to the information prompted in the tool. If you don't talk, please refer to the relevant documentation. 2.3 How to configure the client default character set Configure the client default character set is actually a modification of the locales.dat file in the "$ Sybase / Locales" directory.

Open this file in a Word on the Windows platform, open the file in the UNIX platform, we will see that we will see that all character sets of all character sets are grouped in server-side operating system platform. :.. [aix] locale = C, us_english, iso_1 locale = En_US, us_english, iso_1 locale = en_US, us_english, iso_1 locale = default, us_english, iso_1 locale = En_US.IBM-850, us_english, cp850 locale = en_JP, us_english , eucjis locale = fr_FR, french, cp850 [axposf] locale = C, us_english, iso_1;.. Use Posix Locales, straight from the Posix Guidelines locale = en_US.88591, us_english, iso_1 locale = fr_FR, french, iso_1 locale = zh_CN , Chinese, Eucgb locale = zh_tw, tchinese, euccns locale = ko_kr, korean, eucksc locale = US_ENGLISH.UTF8, US_ENGLISH, UTF8 locale = default, us_english, ISO_1 .. The name of the operating system is placed in each group. " [] "In, and please pay attention to the black body above, there will be" local = default, ... "in each group. We want to modify the client's default character set, which is to modify this line. For example, a system server side is a Sun platform, the server-side language set is ENGLISH, and the character set is CP850. We must modify the client character set and the server, how to do it? First find the [Sun] operating system packet, then modify "Locale = Default, ..." is "Locale = Default, US_English, CP850".

Before modified: [Sun]; From Jle, Kle, Cle, OS / 4.1.1, Man setlocale (); and sun software internationalization guide (p / n 800-5972-0; use setenv lc_cType, lc_messages, lang local = C , us_english, iso_1 locale = fr, french, iso_1 locale = de, german, iso_1 locale = tr, us_english, iso88599 locale = zh, chinese, eucgb locale = zh_CN, chinese, eucgb locale = zh_TW, tchinese, euccns locale = ko, Korean, Eucksc locale = US_ENGLISH.UTF8, USE_ENGLISH, UTF8 locale = default, us_english, ISO_1 Modified: [Sun]; From Jle, Kle, Cle, OS / 4.1.1, Man setLocale (); and sun software Internationalization Guide P / N 800-5972-0; Use setENV LC_CTYPE, LC_MESSAGES, LANG LOCALE = C, US_ENGLISH, ISO_1 LOCALE = fr, French, ISO_1 Locale = de, german, ISO_1 Locale = TR, US_ENGLISH, ISO88599 locale = zh, Chinese, eucgb locale = zh_CN, chinese, eucgb locale = zh_TW, tchinese, euccns locale = ko, korean, eucksc locale = us_english.utf8, us_english, utf8 locale = default, us_english, cp850 save the file to complete the client character set Change it. Here, it will also explain a special situation: in order to satisfy the service The special needs of some applications of the server side, set an environment variable in the server side: Lang, how do you set the client character set? For example, a system server side is a Windows platform, using language set english, character set ISO_1, and set environment variables lang = c.

We must modify the client character set and the server, how to do it? First find the [NT] operating system group, then add a line "locale = C, US_ENGLISH, ISO_1" in this group to modify: [NT] locale = ENU, US_ENGLISH, ISO_1 Locale = fra, french, ISO_1 locale = deu, german , iso_1 locale = japanese, japanese, sjis locale = chs, chinese, eucgb locale = cht, tchinese, big5; locale = kor, korean, eucksc locale = us_english.utf8, us_english, utf8 locale = default, us_english, iso_1 modified: [NT] locale = enu, us_english, iso_1 locale = fra, french, iso_1 locale = deu, german, iso_1 locale = japanese, japanese, sjis locale = chs, chinese, eucgb locale = cht, tchinese, big5; locale = kor, Korean, Eucksc locale = US_ENGLISH.UTF8, US_ENGLISH, UTF8 locale = default, us_english, ISO_1 local = c, us_english, ISO_1 Therefore, before modifying the client character set, first check whether the server is set up "lang", and then Decide how to modify it. 2.4 How to Select ASE Character Element Support Simplified Chinese characters There are four character sets that support Chinese characters in ASE 12.5: CP936, EucGB, UTF-8 and GB18030. Where the EUCGB character set is based on GB2312-80 coding specification, its EUC (Extended Unix Code) The range is the first byte 0xA1 ~ 0xFe (actually only 0xf7), the second bytes 0xA1 ~ 0xFE. The CP936 character set is based on GBK coding specifications (actually national standards are GB13000-90), is the extension of GB2312, the first byte is 0x81 ~ 0xFe, the second byte is divided into two parts, one is 0x40 ~ 0x7e, The second is 0x80 ~ 0xFe. The same area is the same as the GB2312, the word is exactly the same. GB18030 character set (National standard number is GB18030-2000) is a new Chinese coding standard released on March 17, 2000. It is the expansion of GB2312, with a single / double / four-byte encoding architecture, including 27,000 Chinese characters and the main ethnic minorities such as Yan Wen, Mongolian, and Uyghur Wen. Sybase starts supporting GB18030 character set after ASE 12.5.0.3. The UTF-8 character set is a transition scheme that existing ASCII system conversion to Unicode. It represents a character using 1-3 bytes. The length of each character of Simplified Chinese is basically 3 bytes in the length of UTF8. Its most important advantage is that you can support more than 650 languages. A disadvantage is that in the Chinese characters, it is necessary to add 50% of space to be stored. There is also a problem that when the sp_helptext display the stored procedure body, it is possible to have half Chinese characters.

In general, because Eucgb does not support Chinese characters other than national standard 1 and secondary franches, we recommend that users use the CP936 character set on the server side and the client, or after ASE 12.5.0.3, I can use the GB18030 character set, it can Support some uncommon Chinese characters. Its insufficient is that there is only one sort method, that is, the Binary method that is case sensitive. So, if you need to use a database that supports Chinese character sets and is not case-sensitive, you can only use UTF-8 as a server-side character set, and the client uses the CP936 or GB18030 character set. In addition, there is another option that the server side and client use ISO_1 character set, although the ISO_1 character set does not directly support Chinese characters, but we will set the server side and clients to ISO_1 character set, the system will not Character conversion at the client and server side, but only two bytes of a Chinese word are handled by two separate characters, and there is no problem in general. However, when executing the LIKE match query, it may return incorrect results because the server side is based on single-byte to match query conditions, it is likely to have the second byte of the previous Chinese character with the latter Chinese character. The first byte of the internal code combination conforms to the query condition and is returned by the server side as the query result. 2.5 How to view the server side, client character set View server-side character set: Perform: 1> SP_HELPSORT 2> Go View client character set: Perform: 1> Select @@ client_csname 2> GO 3 . Error handling article 3.1 Why does the character set conversion failure 1. When the character stores in the client character set but in the server character set, the ADAPTIVE Server character set conversion will report the conversion error, and vice versa. Users will encounter the following error message: MSG 2402, Severity 16 (Ex_user): Error Converting Client Characters INTO Server's Character Set. Some Character (s) COULD NOT BE Converted. Conversion error prevents the execution of the insertion and update statement. If this happens, check the characters in the data and replace them. 2. Adaptive Server encounters a conversion error when the client sends data, which uses the Ascii code question mark (?) Instead of the suspicious character, but the query batch process continues until completion. After the statement is completed, Adaptive Server will send a message: MSG 2403, Severity 16 (Ex_user): Warning! Some Character (s) Could Not Be Converted Into Client's Character Set. UNCONVERTED BYTES WERE CHANGED TO Question Marks (`? '). 3. When querying the data stored in the client, when the Chinese characters are encountered, there is no prompt information in the client. This is what we often encounter in our app, the reason is that the client does not match the server-side character set. How to explain? Suppose we set the server-side character set for ISO_1, and the client character set is also ISO_1, and then we entered all the data from the client to the server; then when we need to query, the client, its character set is CP850, then the character set that is bound to display on the client is garbled.

转载请注明原文地址:https://www.9cbs.com/read-51746.html

New Post(0)