http://blog.9cbs.neet/mynote/archive/2005/02/23/298179.aspx
Unicode programming
getting Started
Introduction
If you have written programs are users of non-English countries, such as China, Japan, Eastern Europe and the Middle East, then you must be familiar with the Unicode character set. Especially when using Visual C / MFC, when you want to get your application a wider user, you must consider the compatibility of code Unicode, that is, it is in ASCII mode. Run down, you can run in Unicode mode. This article will introduce some basic programming knowledge of Unicode, clarifying many people (including my own) in this issue. This article is definitely worth reading for any person programming using Visual C and / or MFC. What is unicode? Unicode is a relatively popular solution currently used to resolve 256 characters of ASCII code. Everyone knows that the ASCII character set is only 256 characters, expressed in the number between 0-255. Including case of cases, numbers, and minority characters; such as punctuation, currency symbols, etc. These characters have been sufficient for most Latin languages. However, many Asian and oriental languages are far more than 256 characters. Some more than 1,000. In order to break through the limit of the number of ASCII codes, people try to write computer programs for more than 256 characters. So Unicode came into being. Unicode represents a character in a larger range by using a double byte to map the digital code to the character set of multiple languages. Visual C solutions use unicode to use Unicode as software developers? If you are writing programs with Visual C , Unicode compatibility means whether your program has international characteristics, that is, your application is a for local market or an international market. Once you make a decision, you have to implement specific details in your code. Fortunately, Visual C provides a lot of internal features to support Unicode, which can take advantage of these features provided by Visual C when creating a project. Before generating an application framework code, AppWizard allows developers to decide whether to support Unicode. Win32 SDK contains some data types that follow the Unicode coding rules, and the MFC provides a way to convert general text into Unicode data types in the form of a macro. Developers only need to change the habit of writing code to easily write to the application of Unicode. String C programmers generally declare a string array with a char keyword image:
Char Str [100]; like this declaration function:
Void struffpy (char * out, char * in);
In order to change the above declaration to the Unicode character set that supports the double-byte, you can use the following method:
Wchar_t str [100]; or
Void wcscpy (wchar_t * out, wchar_t * in);
In addition, Microsoft also provides a Unicode by pretreatment instructions. Whenever you create a new project with Visual C , just determine if another character set is supported, AppWizard will insert the pre-processing instruction in the header file. These instructions tell the compiler program to support what character sets. This uses the general-purpose data type provided by VC , the compiler will replace the universal data type with the desired supported character set with the corresponding data type. This makes it easy to recompile the code into a program that supports other character sets. In order to activate the Unicode standard in Visual C 6.0, you can do this: After opening the project file, select "Project | Settings" from the main menu to open the engineering setup dialog => and select "C / C " tab => in "PreProcessor definitions "Add unicode or _unicode pre-processed macro in the edit box. As shown in the figure: Figure 1 Project Settings dialog Note What is the difference between Unicode and _Unicode here? The former is not underlined, specifically used for Windows header files; the latter has a prefix underlined, specifically for c running time files. In the code, all the TCHARs with keyword char will replace them with tchar; if you use char *, use LPTSTR to replace; anything defined in double quotes (such as "vckbase online journal") uses text macro Write: Text ("VCKBASE ONLINE JOURNAL"; the main role of the Text macro is that the string is marked as a double-byte string when defining the Unicode / _Unicode preprocessing instruction, otherwise the string is labeled as an ANSI string. The definition of Text is as follows:
TEXT
LPTSTR STRING / / ANSI or Unicode string
);
The parameter string is a string pointer, pointing to the interpreted Unicode or ANSI string in the document, Microsoft provides several data types including general types, are compatible with ASCII and Unicode. This can refer to Microsoft's online documentation about "Universal Data Types and Data Types".
Example code
The following is a simple example to further explore the Unicode programming.
Use the ASCII character set "Hello, World":
// *********************** // "Hello World!" Code implemented with MFC // * ***************************************
//Hello.cpp
#include
// Declare the Application Class
Class ChelloApp: Public CWINAPP
{
PUBLIC:
Virtual Bool InitInstance ();
}
// Create an Instance of the Application Class
ChelloApp HelloApp;
// Declare the main window class
Class Chellowindow: Public CFrameWnd
{
CSTATIC * CS;
PUBLIC:
CHELLOWINDOW ();
}
// the InitInstance Function IS Called Each
// Time The Application First Executes.
Bool chelloApp :: InitInstance ()
{
m_pmainwnd = new chellowindow (); m_pmainwnd-> showwindow (m_ncmdshow);
m_pmainwnd-> UpdateWindow ();
Return True;
}
// the constructor for the window class
Chellowindow :: chellowindow ()
{
// Create the window itself
Create (NULL, "Hello World!", WS_OVERLAPPEDWINDOW,
CRECT (0,0,200,200));
// Create a static label
CS = new cstatic ();
CS-> CREATE ("Hello World", WS_CHILD | WS_VISIBLE | SS_CENTER,
CRECT (50, 80, 150, 150), this);
}
Modify the above code to support the Unicode character set, and the array must be changed to the corresponding Unicode character. The method is to use the TEXT macro for the crux. This macro will tell the pre-processor check what kind of character criterion for use:
// the constructor for the window class
Chellowindow :: chellowindow ()
{
// Create the window itself
Create (NULL, TEXT ("Hello World!"), WS_OVERLAPPEDWINDOW,
CRECT (0,0,200,200));
// Create a static label
CS = new cstatic ();
CS-> CREATE (Text ("Hello World!"), WS_CHILD | WS_VISIBLE | SS_CENTER,
CRECT (50, 80, 150, 150), this);
}
When the preprocessor encounters the general data type, it checks the _unicode definition of the AFXWIN.H header file. Then insert the corresponding data type according to the Unicode definition.
The following example uses the Win32 API function and the general data type to set the C disk's volume label.
// ****************** Set the C-disk volume // ***************************
// drvsvl.cpp
#include
#include
void main ()
{
BOOL SUCCESS;
Char Volumename [MAX_PATH];
Cout << "Enter the new C disk volume:";
CIN >> Volumename;
Success = setVolumelabel ("C: //", Volumename);
IF (SUCCESS)
COUT << "Success / N";
Else
Cout << "Error Code:" << getLastError () << endl;
}
By using the TCHAR data type, declare the top of the character array of this code as two bytes of characters. The Text macro is again used for string of string:
void main ()
{
BOOL SUCCESS;
Tchar Volumename [MAX_PATH];
COUT << Text ("Enter the new C disk volume:");
CIN >> Volumename;
Success = setVolumelabel (Text ("C: //"), Volumename); if (Success)
COUT << Text ("Success / N");
Else
Cout << Text ("Error Code:") << getLastError () << endl;
}
Universal data type in Visual C
Visual C provides several MFC-specific data types for creating applications with international features. These definitions are very common, fully available in Unicode, ASCII, DBCS (Dual-Biode Character Set), and MBCS (multibly character set). Due to space limitations, this article does not intend to involve all these character sets mentioned above. For details on their details, please refer to the relevant information. The MFC provides a transparent way to implement these character sets. The mapping to which the generic data type is mapped and the mapping method is determined based on the project setting, the default value is ASCII mode, and several other options are MBCS, DBCS or Unicode. This article mainly discusses Unicode, so only the mapping relationship between ASCII and Unicode characters will be listed below: Table 1:
Universal MFC data type
Map to ASCII
Map to Unicode
Comment
_Tchar
charr
Wchar_t
_Tchar is a mapping macro that when defining unicode, the data type is mapped to WCHAR_T, and if Unicode is not defined, it is mapped to char.
_T or _Text
CHAR constant string
Wchar_t constant string
The function is in the same macro. In ASCII mode, they are ignored, that is, if the preprocessor is deleted, but if Unicode is defined, they convert a constant string into equivalent Unicode.
LPTSTR
Char *, lpstr (win32)
Wchar_t *
Portable 32-bit string pointer. It maps the character type to the type of engineering settings.
LPCTSTR
Const char *, lpcstr (win32)
Const wchar_t *
Portable 32-bit constant string pointer. It maps the character type constant to the type of engineering settings.
Using the general data type listed in Table 1, developers can guarantee that the project created is always for a character set, which is equivalent to placeholder, replacing the specific byte when compiling, making Applications can run in ASCII and Unicode mode. However, there is a little attention to it, that is, the above general data type is Microsoft, and is not compatible with the ANSI standard. For more detailed descriptions of these general data types provided by Microsoft, please refer to the MSDN library documentation.
Technical notes
In order to successfully compile the MFC program that supports Unicode, you must use the MFC's Unicode repository. This library is an optional installation when customizing Visual C .
One thing is very important: That is not using the Unicode standard in appearance does not affect the execution of the program. That is, the code mentioned above regardless of the setup _unicode generation option, can eventually generate a normal running program. There will be problems when developers use multiple versions of Win32 API functions.
When using multiple versions of WIN32 API functions (any Win32 API function with characters or strings as parameters), the compiler determines the correct function according to whether the _unicode instruction is set. If no _unicode is defined, the compiler will call the ASCII version function by default.
Conclude