Talk about character set
03-8-8 18:02 by leezy_2000
Since the author is the American, I found several famous articles under Windows (such as "Windows Programming", Jeffrey Richter's "Windows Core Programming") is not very reasonable. Now it is clear here to clarify some of the confused issues and indicate that some problems that are easy to make mistakes (I have made it yourself).
First explain a few concepts:
Character set: According to the coding characteristics, the character set can be divided into three categories.
l Narrow Clear Set (SBCS) Each code is represented by one byte, such as ANSI.
l The code in the multi-byte character set (MBCS) character set or single byte, or multiple bytes, such as DBCS, GB2312, and the like.
L Wide Biode Character Set Character Set Each character is represented by two bytes. Such as Unicode
Code Page: In Unicode and DBCs, due to the included code, you need to organize these code in order to use convenience. The organization method is to put different countries' code from different code pages.
The character set and code page relationship: By you can see, for Unicode and DBCS, the code page is from the character set. However, for the SBCS class character set (such as ANSI) and MBCS character set outside of DBCS (such as GB2312, etc.) only corresponds to one code page.
Look at the potential program:
Void converandoutputString (HDC HDC, LPWSTR WSTR, INT Length, Int x, int y)
{
int NRET;
INT SizeBuffer = 2 * Length;
Char * lpbuffer = new char [sizebuffer];
NRET = WideChartomultibyte (CP_ACP, 0, WSTR, Length,
LPBuffer, SizeBuffer, Null, NULL
TextOut (HDC, X, Y, LPBuffer, NRET);
DELETE [] lpbuffer;
}
This program is very simple, just turn a wide string to the DBCS string, and output it according to the specified coordinates. Jeffrey Richter also uses almost the same method in the "Windows Core Programming" on page 26. But this program is actually problematic. The problem should not be hard-coded when converting a string, and should be dynamically obtained according to the current font. Otherwise, in some cases, the Unicode characters in WSTR will not be converted to the correct code. If you use the above code to perform Chinese output, you will be fortunate to see that many question marks are automatically added to your string.
The solution is also very simple, but first you have to be familiar with the following two API functions:
INT GetTextCharset (HDC HDC); // This API is used to get the character set of the current font.
Bool TranslateCharsetInfo
DWORD * PSRC, // Information
LPCharsetInfo LPCS, // Character Set Information
DWORD dwflags // translation opt
);
This function can convert character sets, code pages, and FontSignature to each other. Convert information
Placed in LPCS. DWFlags indicates that the conversion needs to be converted, and the character set is converted to the code page or other.
It is important to note that the PSRC parameter, this parameter needs to be in the character set to the code page.
It is a pointer with a pointer type rather than pointing to a value. So output a function for the above string
Just add the following two lines to ensure that the string does not encounter the condition code during the conversion. Void converandoutputString (HDC HDC, LPWSTR WSTR, INT Length, Int x, int y)
{
int NRET;
INT SizeBuffer = 2 * Length;
Char * lpbuffer = new char [sizebuffer];
INT Charset = GetTextCharset (HDC);
CharsetInfo csinfo = {0};
TranslateCharsetInfo (DWORD *) Charset, & Csinfo, TCI_SRCCHARSET
NRET = WideChartomultibyte (Csinfo. .ciacp, 0, wstr, length,
LPBuffer, SizeBuffer, Null, NULL
TextOut (HDC, X, Y, LPBuffer, NRET);
DELETE [] lpbuffer;
}
Finally, the theme of this article is to dynamically determine the code page when doing the conversion of the character set.
Further details of the functions and structures involved, please refer to MSDN.