Byte stream coding is so complex

xiaoxiao2021-03-06  43

Byte stream coding is so complex

A) There are many cases in which we need to know the code. Know its encoding

2) Explore C # currently has no ready-made functions to get, through and colleagues, discussing the UTF8 file has a 3-byte head, "EF BB BF" (called Bom - Byte Order Mark), judgment Is this header not solved? code show as below:

/ / Judgment is the encoding of the uploaded file is UTF8, BUFF is the byte stream encode encoded for upload file ENC = Encoding.utf8; testencbuff = enc.getPreamble (); if (FileLength> TESTENCBUFF.LENGTH && TESTENCBUFF [0] == BUFF [ 0] && testencbuff [1] == buff [1] && testencbuff [2] == buff [2]) {// is UTF8 encoded string buffstring = enc.getstring (buff);} But then discovery, not all UTF8 The encoded file has a BOM information, how do you solve it?

3) The final solution does not have a BOM information that can only be solved by a one-byte comparison method. Fortunately, some people have solved this problem. Recommend you to see: http: //dev.9cbs.net/develop/Article/10961.shtmhttp: //dev.9cbs.net/develop/article/10/10962.shtm Here, all encodings, basically It is a way to pass bytes. The Java code is easy to transplant to .NET, below is the C # code of the UTF8 judgment section:

INT UTF8_PROBABILITY (byte [] RawText) {int score = 0; INT I, RAWTEXTLEN = 0; int goodbytes = 0, asCIBYTES = 0;

// Maybe Also Use UTF8 BYTE ORDER MARK: EF BB BF

// Check to see reason rantext1XTextlen = RawText.length; for (i = 0; i

IF (256-64 <= m_rawint0 && m_rawint0 <= 256-33 && // Two Bytes i 1

Score = (int) (100 * (Float) GoodBytes / (Float)))))

// if not Above 98, Reduce to Zero to Prevent Coincidental Matches // Allows for Some (FEW) Bad factoryd sequences if (Score> 98) {Return Score;} else f (score> 95 && goodbook> 30) {Return Score Else {return 0;}

}

Reference: Character detection program (on) detection GB2312, BIG5 ... http://dev.9cbs.net/develop/Article/10/Article/10/10961.SHTMHELLO UNICODE - Java Chinese Processing Note HTTP: //www.chedong.com/tech/hello_unicode.html

转载请注明原文地址:https://www.9cbs.com/read-62958.html

New Post(0)