CMS Chinese Code Problem Analysis and Solution

xiaoxiao2021-03-06  76

CMS Chinese Code Problem Analysis and Solution

Opening note: Some comments believe that UTF-8 encoding will solve all language coding problems. I don't agree to this opinion: one for Chinese users, Simplified and Traditional users don't usually have two codewords for the river; second, the current large number website uses GB / BIG5 encoding, and MB / ICONV GB / BIG5 - UTF -8 Conversion support is not ideal.

In the development of CMS, it must be facing language problems, mainly to multibly byte processing, including encoding charSet conversion, word length calculation, etc. In general, ordinary multi-byte issues can be solved with MB / ICONV modules, but for Chinese, the two modules brought by PHP are not perfect. XOOPS Chinese version uses XCONV Modules - Iconv for Xoops. This module uses a lookup replacement method, low efficiency, but there is no way. [XOOPS can freely select the encoding method, if you choose UTF-8 encoding, the following content rights is a tea]

This article will try to do the following analysis [Whether or not to prepare in the existing information, the logic and time order]: BIG5 scoring problem introduction, analysis and solution; XCONV module introduction and use; multi-byte issues and processing schemes.

BIG5 scifting problem analysis and solution

When processing BIG5 encoding, there is a famous issue "Limit" - BIG5 scifting must face, and the extracted part will be extracted from Taiwan.

Talk about Xu Lai

How do Xu Yong? The number of people, as long as they have used the PHP MYSQL standing station, no one knows no one, and it is absolutely painful in these standings ... (Excerpted from OSCommerce Shopping Website

Carefully explore the reasons, you will find that the wrong person is actually not a lot of money, but is known as the big five-code BIG5 code, and the historical legend of the big five yards has no statement. We don't comment on the original coding and right and wrong. Readers who are interested in this history may wish to search for "BIG5", I believe you can find a bunch of information. Since the big five yards seem to have some problems, why is this code that currently all traditional Chinese? In the 1983-1984, the personal computer is gradually promoting in Taiwan. The suit software on the computer has begun to prevail, in order to solve the problem of computer processing Chinese, it is to develop a set of Chinese internal code, which is the "BIG5 code" we are not known. Big five code. And people who have experienced this time must not forget the marketing techniques of the original Chinese in the world. At the time of foreign software manufacturers, they will introduce the original software concept, but the heaven Chinese is anti-road, allowing campuses and even general users unconditionally. Free copy of the Chinese system is copied. Therefore, the Yi Tian Chinese has almost become Chinese standards at the time, but because of the BIG5 encoding, it is strictly, which is the main reason why the BIG5 code has been used in today.

Where is the big five yard wrong? Wrong does not exclude the control code of the US standard information exchange code asCII (American Standard Code for Information Interchange), all those who read computer instead know that ASCII is by Byte, and 1 byte = 8 bits, so ASCII can have up to 2 ^ 8 = 256 characters. For only 26 letters, there are more than 26 letters, but there are absolutely not enough for tens of thousands of words, so you must use two bytes to represent a Chinese word. The encoding of the word "in" is "A4A4". However, in the BIG5 code design, in order to avoid conflicts with the ASCII, the first Byte of each Chinese word uses only the high-character character (129-255) in ASCII, but in the second Byte used some low words (1 -128) This is the biggest gang of BIG5 code in the future application. Why is the BIG5 code to find PHP MySQL? The reasons are three: one, SQL hidden code problem: We know, if you want to get the grammar when you pick up your information in the MySQL database:

Select * from administrator where id = 'ABC' and passwd = '1234'

Suppose I have a login.php web page, the content is used to enter the ID and passwd value form (from), if someone is directly listed directly from the URL:

Login.php? id = ABC & Passwd = '% 20OR% 201 = 1% 20OR% 201 ='

Since% 20 will be interpreted as blank by the browser, the SQL syntax that last throwing to MySQL is:

Select * from administrator where id = 'abc' and passwd = 'or 1 = 1 OR 1 =' '

This is a constant style, deceived verification, and obtains the authority of Administrator. Therefore, single quotes become a top invisible killer. Second: PHP's jumping character 5c is used in PHP as a jumping character, that is, when the text inside the variable is with single quotes or double quotes, in order to correctly display these special characters correctly, You usually need to add more \, common examples, such as: ";>>>>

If you don't add, you will have an error message immediately:

PARSE ERROR: PARSE ERROR, UNEXPECTED T_LNUMBER, EXPECTICTED T_LNUMBER, EXPECTICTED ',' OR ';' IN C: AppServwwwcode.php on line 2

In this way, the problem is coming, when we want to insert a material to the database, such as:

INSERT INTO MyTable Values;

Since the second BYTE of the work is 5C, plus the single quotes behind, so after interpretation, the last single quotes are determined as text, and thus, the SQL syntax has less than the last single quotes, of course, will not write In the database, there was a mistake. Third, addslashes and stripslashes function: In order to solve the single quotes may be used as a tool to be used as an attack database, it is generally added with the addslashes function to add a single quarter in front of the single quotes in the variable. PASSWD input: 'or 1 = 1 or 1 =' can be cheated verified, turned into: \ 'or 1 = 1 or 1 = \' This can avoid single quotes to be used as an attack database Tools. However, in this way, the jumper character will be written directly into the database as input text, so when we write the database, you must use the StripsLashSheSheSheSheShe. Delete, otherwise the information displayed will be more jumped. BIG5 does not only find a PHP trouble, even Unix is ​​fortunate to be difficult 7C is a PIPE '|' use Unix's PIPE '|' It should know what it is to do, give a simple example, if you use FTP to upload a "four.doc" The file name is entered, and immediately turned "North .doc ', I think too many people have this experience, the reason is nothing, the Chinese word" four "BIG5 code is" A57C ", when UNIX sees 7C I will feel inexplicable, upload a "|" What to do? So I handed it ... So you can imagine that as long as it is a Chinese word, the second BYTE is "7C", and it is also difficult to escape Big5.

Xu Yong's solution, the removal program has a problem in the StripsLashS function in that code code, in addition to displaying "Xu Lai", there is no big problem, but Mysql Server hidden The code and jumping character are still exist. Second, use the BIG5_FUNC string to handle the function set If you have carefully studied the method of handling the licensed work in OSC, you should find the [Webroot] / Catalog / Includedes / Languages ​​/ Tchinese directory with a data clip named BIG5_FUNC, in fact It is a function set written on the Internet to solve the problem of BIG5 issues, we call "BIG5 string processing function set".

Postscript Because the problems caused by BIG5 are almost all, the author believes that Xu Lai will have always plagued all stands. Here, the author also tried to find a solution to the person who is troubled by this remarkable station but suffers from this problem. Finally, there is also specifically declare that BIG5 may not be just this, or it may be quite tricky, even exceed the scope of the author can solve, but if you have any questions, please also welcome you to the online Garky store community, believe We have a lot of enthusiastic people to solve the problem of BIG5.

The text that is easy to screw code in the BIG5 code

- Garylee

Because of the negligence of the original design, some information that belong to the control character is easy to cause misconduct of some programming environments. The Chinese word that is displayed is not what we want. So we are particularly careful for Chinese processing when we write. Some people who are easy to trigger, the writer, which is best able to test the following words in your program.

ASCII (5C) == "/"

A45C? AE5C 娉 B85C 稞 C25C 摆 A55C power AF5C 珮 B95C uranium C35C 黠 A65C 吒 B05C leopard BA5C 暝 C45C 孀 A75C 吭 B15C 崤 BB5C cover C55C 8 A85C 沔 B25C tears BC5C 墦 C65C 蹑 AA5C 坼 B35C Xu BD5C Valley AA5C 殁 B45C 廄BE5C reading AB5C Yu B55C BF5C AC5C Dium B65C q C05C meal AD5C 苒 B75C C15C ASCII (7C) == "|"

AA7C Chi B47C rub A87C fertility BE7C Lu B27C Li BC7C evil thought C67C stork A97C still B37C Di BD7C curse A77C pit B17C sad BB7C commandment C57C stack A67C sail B07C hospital BA7C leak C47C braided AB7C pharynx B57C tax BF7C cake AC7C er B67C leap C07C taste AD7C Tiao B77C will c17c raised A47C 弋 AE7C path B87C C27C 瓮 A57C four AF7C B97C C37C 牍

solution

PHP BIG5 FUNCTION Credit Solution

RAP solution

to be continued

转载请注明原文地址:https://www.9cbs.com/read-93282.html

New Post(0)