Unicode Code Transformation Introduction [Turn, Collection]

xiaoxiao2021-03-06  95

Unicode encoding transformation introduction

1 Introduction

1.1 purpose

This document describes the transformation method between various codes under Win32 and Linux and other platforms.

1.2 overview

ANSI C defines a unified encoding transformation function setLocale, Mbstowcs, WCSTombs, but these functions are implemented on the Win32 platform, but not on the Linux platform, encoding conversions on the Linux platform requires calling iconv_open, iconv, iconv_close functions.

2 Coding transformation under Win32 platform

2.1 ANSI C Method

Under Win32 Platform, ANSI's setLocale, WCSTOMBS, MBSTOWCS, L''STR "are running normally. The parameters behind the setLocale may be'" .acp "," ", null," c "," Chinese ",

"Chinese_people's republic of china.936", etc.. Generally, ".acp", representing the code page, null, "" C "currently used by the operating system represents the default code page of the operating system, please pay attention to the fine difference between the two. L "Str" represents the Unicodelittle encoding of the STR.

Void TestBmpStostr (const char * pc_locale, const wchar_t * pc_wnm,

Char * pc_data, int npcdatalen)

{

Char * pclocale = setLocale (lc_cType, pc_locale);

Cout << pc_locale << "locale =" << (PCLOCALE == NULL? "false": pclocale) << ENDL;

MEMSET (PC_DATA, 0, NPCDATALEN);

INT N = Sprintf (PC_DATA, "% LS", PC_WNM);

PC_DATA [N] = 0; cout << "sprintf convert =" << pc_data << endl;

MEMSET (PC_DATA, 0, NPCDATALEN);

n = WCSTOMBS (PC_DATA, PC_WNM, NPCDATALEN);

PC_DATA [N] = 0; cout << "WCSTombs Convert =" << PC_DATA << Endl;

}

2.2 MultibyTetowideChar and WideChartomultibyte Method

Win32 platforms provide two encoded coded conversion functions MultibyToWideChar and

WideChartomultibyte. Transformation between two different coding pages needs to be transferred through UnicodeLittle encoding. E.g:

BOOL ACP2UTF8 (const hgchar * pc_acp, hgcharrray & charrutf8)

{

Hgchararray charrunicode;

INT NLEN = 2 * Strlen (PC_ACP) 1;

Charrutf8.resize (NLEN 1);

Charrunicode.resize (Nlen 1);

IF (0 == MultibyToWideChar (CP_ACP, 0, PC_ACP, -1, (LPWSTR) & Charrunicode [0], NLEN))

Return False;

IF (0! = widechartomultibyte (CP_UTF8, 0, (LPCWSTR) & Charrunicode [0], -1, & Charrutf8 [0], NLEN, NULL, NULL))

Return True;

Return False;

}

3 Code transformation under Linux platform

3.1 ANSI C Method

The ANSI C method under Linux setLocale, Mbstowcs, WCSTombs are not available, will be wrong. Also l "str" ​​is not converted into UNICODE encoding, such as L "Fujian Operating roots CA" under redhat, B8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 A1 00 00 00 D4 00 00 00 CB 00 00 00 D3 00 00 00 AA 00 00 00 B8 00 00 00 F9 00 00 00 43 00 00 00 41 00 00 00 00 00 00 00, results in the CYGWIN is B8 00 A3 00 bd 00 A8 00 CA 00 A1 00 D4 00 CB 00 D3 00 AA 00 B8 00 F9 00 43 00 41 00 00 00, but its actual unicodelittile encoding is 8F 79 FA 5e 01 77 D0 8F 25 84 39 68 43 00 41 00 00 00 00 00.

3.2 iconv method

The iCONV related methods are mainly used under Linux to transform each encoding. Its basic call mode is, iconv_open, iconv, iconv_close. The two parameters of ICONV_OPEN represent the two coded pages that need to be transformed. The ICONV function represents encoding transformation. Note that the four parameters behind them are variable, and when returned, the last byte of the transformed is completed. One byte, unconverted bytes, the next byte of the buffer, the remaining buffer bytes. E.g:

Int Testunicode ()

{

#ifDef __linux__

#define iconv_const

#ELSE

#define iconv_const const

#ENDIF

Char inbuf [hg_large_str_len] = "Fujian Province Operating Root CA";

Char outbuf [hg_large_str_len];

Iconv_const char * PIN = Inbuf;

Char * pout = Outbuf;

SIZE_T INLEFT = Strlen (PIN) 1;

SIZE_T OUTLEFT = HG_LARGE_STR_LEN;

Iconv_t cd = iconv_open ("Unicodebig", "CN-GB");

IF ((int) CD == -1)

Return -1;

IF ((int) iconv (CD, & PIN, & INLT, & Pout, & Outleft) == -1)

Return -1;

Iconv_close (CD);

Mem_Object :: binarytoascii ((hgbyte *) Outbuf, hg_large_str_len-outleft, inbuf;

COUT << "Actual WCS =" << Inbuf << Endl;

Return 0;

}

The two parameters of the iConv_open function can be seen using the command iconv -l> codepage.txt.

4 Code transformation under the Cygwin platform CYGWIN platform is consistent with the encoding conversion under the Linux platform, but the iconv * function is in libiconv.a.

5 Coded transformation under other platforms

Other Solaris, coding transformation under AIX platforms have not been studied, but should be similar to conversion under the Linux platform.

转载请注明原文地址:https://www.9cbs.com/read-100835.html

New Post(0)