Let Fopen open the files created under different code pages
Question: Under the English Win2000, an application operates a file with a Chinese character in a file name.
Analysis: After the file is created under Chinese Win2000, copy it to English Win2000, it is initially thought that this involves the problem of code page! The application uses a third-party toolkit that requires an ANSI encoded file name instead of Unicode. Further analysis found that the toolkit is opened with FOPEN (). So the problem becomes how made Fopen works normally between different code pages? To solve the problem, you must first understand the specific reasons for the problem. To view the document, Win2000's file system actually records two names of the file: long file name and short message name. The long file name is saved by Unicode, the short file name is saved by ANSI! Obviously the above operation is based on the long text name. Then the problem can be understood:
The Chinese WIN2000 code page is 936, and the ANSI encoding and Unicode encoding of the characters under this code page are one or one. The long text name of the file is Unicode, and fopen () requires ANSI as a filename parameter, so if you want to use fopen (), you must need to convert between Unicode and ANSI, and this conversion may be you can do. Come. For example, you use getopenfilenamea () to get the file name, then the conversion doesn't have to care; if you have to use getopenfilenamew (), then your code's next line may be widechartomultibyte (). In short, the use of Fopen does not have any problems.
After the file is migrated to the English Win2000, the problem will appear.
The English WIN2000 code page is 437. There is no meaning of the Unicode of a Chinese character to 437, which is called Unmappable Character. In general, the conversion result of UNMAPPABLE CHARACTER will be a '?'. So after the conversion, the original information in the file name is lost. This is why English Win2000 Fopen cannot open Chinese files.
solve:
As mentioned earlier, Win2000's file system is used to use ANSI encoding to save short file names. We can use it to solve the problem.
How to get a short message name? There is an API to do this:
DWORD GETSHORTPATHNAME (LPTSTR LPSZLONGPATH, LPTSTSTSTSZSHORTPATH, DWORD CCHBUFFER);
GetshortPathname () requires a long file name (actually PATH) as a parameter, which means we can only use getshstpathnamew (), then the short message name we get is also Unicode. Is this not the same problem with the long file name? ? ?
Note that a fact: an ANSI string, there is a different meaning in different code pages. However, under any code page, turn the ANSI to Unicode, then transfer from Unicode to ANSI, this string remains unchanged, will not lose any information! ! ! So, we can safely use WideChartomultibyte () to use widechartomultibyte () to be used to ANSI, and then hand it over to FOPEN.
Sample code:
File * file = NULL;
Wchar_t wsshortname [1000] = {0,};
OPENFILENAMEW ofn; wchar_t szFile [MAX_PATH]; // Initialize OPENFILENAME ZeroMemory (& ofn, sizeof (ofn)); ofn.lStructSize = sizeof (ofn); ofn.hwndOwner = NULL; ofn.lpstrFile = szFile; ofn.lpstrFile [0] = '/ 0'; OFN.NMAXFILE = SIZEOF (SZFILE); OFN.LPSTRFILTER = L "all / 0 *. * / 0Text / 0 * .txt / 0"; off.nfilterindex = 1; OFN.LPSTRFILETITLE = NULL; OFN.NMAXFILETIL = 0; OFN.LPSTRINIALDIR = NULL; OFN.FLAGS = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST
if (GetOpenFileNameW (& ofn)) {GetShortPathNameW (ofn.lpstrFile, wsShortName, 1000); char sShortName [1000] = {0}; int length = WideCharToMultiByte (CP_ACP, 0, wsShortName, -1, sShortName, 1000, NULL, NULL File = fopen (sshortname, "rb");
Fclose (file);
Analysis: After the file is created under Chinese Win2000, copy it to English Win2000, it is initially thought that this involves the problem of code page! The application uses a third-party toolkit that requires an ANSI encoded file name instead of Unicode. Further analysis found that the toolkit is opened with FOPEN (). So the problem becomes how made Fopen works normally between different code pages? To solve the problem, you must first understand the specific reasons for the problem. To view the document, Win2000's file system actually records two names of the file: long file name and short message name. The long file name is saved by Unicode, the short file name is saved by ANSI! Obviously the above operation is based on the long text name. Then the problem can be understood:
The Chinese WIN2000 code page is 936, and the ANSI encoding and Unicode encoding of the characters under this code page are one or one. The long text name of the file is Unicode, and fopen () requires ANSI as a filename parameter, so if you want to use fopen (), you must need to convert between Unicode and ANSI, and this conversion may be you can do. Come. For example, you use getopenfilenamea () to get the file name, then the conversion doesn't have to care; if you have to use getopenfilenamew (), then your code's next line may be widechartomultibyte (). In short, the use of Fopen does not have any problems.
After the file is migrated to the English Win2000, the problem will appear.
The English WIN2000 code page is 437. There is no meaning of the Unicode of a Chinese character to 437, which is called Unmappable Character. In general, the conversion result of UNMAPPABLE CHARACTER will be a '?'. So after the conversion, the original information in the file name is lost. This is why English Win2000 Fopen cannot open Chinese files. solve:
As mentioned earlier, Win2000's file system is used to use ANSI encoding to save short file names. We can use it to solve the problem.
How to get a short message name? There is an API to do this:
DWORD GETSHORTPATHNAME (LPTSTR LPSZLONGPATH, LPTSTSTSTSZSHORTPATH, DWORD CCHBUFFER);
GetshortPathname () requires a long file name (actually PATH) as a parameter, which means we can only use getshstpathnamew (), then the short message name we get is also Unicode. Is this not the same problem with the long file name? ? ?
Note that a fact: an ANSI string, there is a different meaning in different code pages. However, under any code page, turn the ANSI to Unicode, then transfer from Unicode to ANSI, this string remains unchanged, will not lose any information! ! ! So, we can safely use WideChartomultibyte () to use widechartomultibyte () to be used to ANSI, and then hand it over to FOPEN.
Sample code:
File * file = NULL;
Wchar_t wsshortname [1000] = {0,};
OpenFileNamew OFN; Wchar_t Szfile [MAX_PATH];
// Initialize OPENFILENAME ZeroMemory (& ofn, sizeof (ofn)); ofn.lStructSize = sizeof (ofn); ofn.hwndOwner = NULL; ofn.lpstrFile = szFile; ofn.lpstrFile [0] = '/ 0'; ofn.nMaxFile = sizeof (szfile); OFN.LPSTRFILTER = L "all / 0 *. * / 0Text / 0 * .txt / 0"; off.nfilterindex = 1; off.lpstrfileTitle = null; off.nmaxfileTitle = 0; off.lpstrinitialdir = NULL; OFN.FLAGS = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST
if (GetOpenFileNameW (& ofn)) {GetShortPathNameW (ofn.lpstrFile, wsShortName, 1000); char sShortName [1000] = {0}; int length = WideCharToMultiByte (CP_ACP, 0, wsShortName, -1, sShortName, 1000, NULL, NULL File = fopen (sshortname, "rb");
Fclose (file);