Unicode VS ANSI Visual Basic 32-Bit version of the string processing is unicode, that is, the string is stored inside the VB in the format of Unicode.
What is unicode? Simply put, every franchis is expressed in 2-byte, and each "real body ration" is a "character". therefore,
Len ("Hello") LEN ("ABC")
The values transmitted are 3 because "Big" and "A" are all characters.
But this is handled for some Chinese characters, such as the information file of the pure text, is a big disaster, because you must locate each character with Byte, but Unicode has smashed everything. E.g:
Len ("Good Morning") is transmitted 12, and LEN ("Today's weather is very good")
For beginners, it is a great thing that can be used by VB to write a way. But immediately in Chinese processing, the strike is not small. But don't be afraid, in fact, as long as you know more about some instructions, you can solve the problem of Chinese processing.
What is the instruction? The most important thing is that StrConv is. The StrConv function is:
StrConv (standby strings, conversion format)
Where the conversion format is used here:
Vbunicode converts the ANSI string to Unicode vbfromunicode to convert Unicode strings to ANSI
After converting strings into ansi, all string processing instructions are added B, for example: Leftb, Rightb, MIDB, ChRB, INSTRB, LENB, INPUTB, etc. Use these instructions to handle it.
When you have finished processing, you can turn it back to Unicode so you can use a general string to handle instructions.
Do you understand this? If you still don't understand, look at the example below:
[●] Simple use example
See the basic example below you should have some concepts for the string of VB.
Private submmand1_click () Dim Sunicore As String Dim Sansi As String
'Unicode operation SUNICODE = "Wang Xiaoming, A123456789, 651023, No. 100 Zhongshan Road, Taipei, (02) 2345678" Debug.print Len (SUNICODE)' Back 44 Debug.print MID $ (Sunicode, 5, 10) 'Back A123456789 Debug.Print Instr (SUNICODE, "Taipei") 'Back 23
'Transfer Unicode string to ANSI Sansi = StrConv (SUNICODE, VBFROMUNICODE)' ANSI operation Debug.print lenb (sansi) 'Back 54 debug.print MIDB $ (Sansi, 8, 10)' Back ????? Because I forgot to go back to Unicode Debug.Print StrConv (MIDB $ (Sansi, 8, 10), vbunicode "back A123456789, please pay attention to the action of Unicode must do debug.print INSTRB (Sansi, StrConv (" Taipei City ", vbfromunicode) 'Back 23, don't forget to turn" Taipei "also transferred to ANSI, otherwise you will not find end SUB
[●] read the text file
In the tips of VB, there is a fast reading method:
Private submmand1_click () Dim Sfile As String
Open "c: /filename.txt" for Input As # 1 sfile = INPUT $ (Lof (1), # 1) Close # 1 End Sub
But unfortunately, if you read the archives, the process of INPUT PAST End Of File will appear in this section. Because the LOF is transmitted back to the BYTE number of the file, the INPUT function reads the number of characters. Due to the Chinese in the file, the number of characters in the file will be less than the BYTE number, so the error occurs.
To solve this problem, we have to use two functions of strconv and infutb:
Private submmand1_click () Dim Sfile As String
Open "c: /filename.txt" for Input As # 1 sfile = strconv (INPUTB $ (Lof (1), # 1), vbunicode) Close # 1 End Sub
The above correction will first read the file with INPUTB, but the file read in INPUTB is an ANSI format, so it is necessary to use StrConv to Unicode.
[●] random data file
Many text files are seized in a fixed position group, such as the following data format:
Wang Xiaomin 650110 Taipei Zhongshan Road No. 100 (02) 1234567 Zhang Daxing 660824 No. 23, Guangdong Street, Dajia Town, Hualien County (03) 9876543 ......
How do you deal with this type of file? This is necessary to use Type and byte Array.
Private type tagrecord username (5) as byte ' Transferring Function 2 bytes End Type
Private Sub Command1_Click () Dim Urecord As TagRecord
Open "c: /filename.dat" for random as # 1 len = lenb (urecord) Get # 1, 2, URECORD 'Take the second information
WITH URECORD 'with ... END with WITH should be debug.print .Username' back ??? debug.print strconv (.username, vbunicode) 'Back to "Zhang Dad" end with
Close # 1 End Sub
In this example, you must use Byte Array because only byte array can be properly positioned to each Byte position. The method used to use the string to locate it is not applicable, don't remember! However, the information read in Byte Array is an ANSI format. If you want to handle or make an operation, remember to turn into Unicode format.
[●] use byte array
In addition to the exemplary of the exact positioning of BYTE, the processing of pure text is basically not used by Byte Array. Byte Array is usually used in handling binary information. In this regard, we will discuss another article.
See it! As long as you are familiar with using StrConv, you can change freely between Unicode and ANSI formats. I believe that when you have finished reading this article, you should no longer worry about processing Chinese! Returns the problem string Chinese in the string Chinese, the string of VB is using Unicode, and we generally use the ASCII Code. Where is this difference? The length of the Unicode is 2 Byte, and ASCII is a byte. If I will write the string of VB to the file, sometimes I can't think of the result. For example: text1.text = "This is an abc" len5 = len (STR5) If our Access database has a column length is 10 Byte, we set maxlength = 10 in TextBox, but the above example get Len5 is 7, rather than what we think is 11, because no matter Chinese or English, VB will be stored in Unicode, so the length of STR5 is 7 "characters", and the maximum length limit of Text1 is 10, 7 is not exceeded 10, the user can still enter, but when archive, 11 BYTE exceed 10 Byte, so it will be wrong. However, some people have found that when using RS232 to pass the information, the other end host is an ASCII encoded machine. If we use String to pass, it can be passed, in fact, that is VB when transmitting DATA, will do conversion Make our program design more convenient, but if the information is binary, it is big. For example, in a string, it is used to transmit ASCII greater than 128, often some problems, because ASC (Chr (129)) = 0, so that we cannot use a CHR () instruction to place information. (In fact, you can use ChRW (129) to get the value, and use ascw () to get the value, add a W representative is Word's operation), this time, only use Byte Array.
1.Unicode transfer to bytearydim Byteary () AS BYTE DIM STR5 AS STRING DIM I as long str5 = "this abc" Byteary = str5 for i = lbound (byteary) to ubound (Byteary) Debug.print Byteary (i) 'get 25 144 97 0 98 0 99 0 NEXT IdeBug.Print Len (STR5), LENB (STR5) 'So 4 8, you can see the characteristics of Unicode, the program should be changed, using strconv () to convert DIM Byteary () AS BYTEDIM STR5 AS STRING DIM I as long str5 = "this abc" Byteary = strconv (str5, vbfromunicode) for i = lbound (byteary) to Ubound (Byteary) Debug.print Byteary (i) 'Get 25 144 97 98 99 Next IDebug .Print lenb (str5, vbfromunicode) 'Get 52.Byteary Revert to Unicode Using StrConv () Convert Dim Byteary (10) AS Byte Dim Str5 As StringByteary (0) = 25 Byteary (1) = 144 Byteary (2) = 97 Byteary (3) = 98Byteary (4) = 99 str5 = strconv (Byteary, vbunicode) 3. Some useful functions Substr () Chinese Culture string, relative MID () strlen () Chinese critical strings Relative Len () strLeft () Take the left string, relative to Left () strright () Take the right string, relative Right () ischinese () Check Somewhere Chinese Public Function Substr (ByVal TSTR AS String, Start AS Integer, Optional Leng As Variant AS Stringdim Tmpstr As Stringif Ismissing (Leng) THENTMPSTR = STR Conv (MIDB (StrConv (TSTR, VBFROMUNICODE), START, VBUNICODE ELSETMPSTMPSTR = STRCONV (MIDB (StrConv (TSTR, VBFROMUNICODE), START, Leng), Vbunicode) end ifsubstr = TmpStrend Function
Public Function Strlen (Byval Tstr As String) AS INTEGERSTRLEN = LeNB (StrConv (TSTR, VBFROMUNICODE) End Function
Public Function StrLeft (ByVal str5 As String, ByVal len5 As Long) As StringDim tmpstr As Stringtmpstr = StrConv (str5, vbFromUnicode) tmpstr = LeftB (tmpstr, len5) StrLeft = StrConv (tmpstr, vbUnicode) End Function
Public Function StrRight (ByVal str5 As String, ByVal len5 As Long) As StringDim tmpstr As Stringtmpstr = StrConv (str5, vbFromUnicode) tmpstr = RightB (tmpstr, len5) StrLeft = StrConv (tmpstr, vbUnicode) End FunctionPublic Function isChinese (ByVal asciiv As INTEGER) AS Booleanif LEN (HEX $ (Asciiv))> 2 TenisChinese = trueEnd IFEND Function
The length of the mixing string is in the Chinese environment, each word is used as two byte: Len ("Han 1") = 2LENB ("Han 1") = 4 But in many cases, we hope that the Chinese word is 2, English characters are 1. The following functions are available: LeNB (StrConv ("Han 1"), VBFormunicode))
Clear the character specified in the string This function clears Search in the string s (Note: If S is AAABBB, Search is Ab. What?):
Function Stringclener (S, String, Search As String) AS STRINGDIM I AS INTEGER, RES AS STRINGRES = SDO while INSTR (RES, SEARCH) = I = INSTR (RES, SECH) RES = Left (RES, I - 1) & MID RES, I 1) loopstringclener = resend function returns