Signed && unsigned in the c language

xiaoxiao2021-03-06  69

GCC handles symbolic characters and unsigned characters analysis are taken from JDEV Bloves 2004-01-16 18:46 http://ccb.77jj.com/User03/bbs/ccb/index.cgi GCC how to deal with symbolic values ​​and unsigned worth it? The following article may have an error, or understand the improper understanding, I hope that after you have seen the article carefully, if you find an error, please indicate the error, thank you. The following is a similar and similar program to analyze the GCC's processing, and what is done in the computer. Let's first look at program_one.c and program_two.c, and how the compiler is processed: program_one.c ------------------------- ------ #include

INT Main (int Argc, char * argv []) {char p = 255; // ff printf ("% d", p); // This statement is to print the decimal value of the ASCII code to the screen // p Is a number of symbols} program_two.c #include

INT Main (int Argc, char * argv []) {unsigned char p = 255; // 255 is the ASCII code, translated into 16-based is 0xff printf ("% d", p);} Let's discuss it : Everyone knows in C, character is divided into two forms with symbols and no symbols. The default is to have a symbol: CHAR CH; this form. By unsigned char ch; such a form is defined as unsigned. So what is the difference between these two: Compile the program_one.c program, then use the DISASS command in GDB to reverse the main subroutine: as follows: #include

INT Main (int Argc, char * argv []) {char p = 255; // ff printf ("% d", p); // This statement is to print the decimal value of the ASCII code to the screen // 255 Is a symbol number} dump of assembler code for function main: 0x401284

: push% EBP 0x401285

: MOV% ESP,% EBP 0x401287

: SUB $ 0x8,% ESP 0x40128A

: And $ 0xffffffff0,% ESP 0x40128D

: MOV $ 0x0,% EAX 0x401292

: MOV% EAX, 0xfffffffff8 (% EBP) 0x401295

: MOV 0xfffffff8 (% EBP),% EAX 0x401298

: Call 0x402920 <_alloca> ​​0x40129d

: Call 0x401360 <__ main> 0x4012a2

: MOVB $ 0xFF, 0xffffffff (% EBP) // Copy a stream to 0xFffffFFFF (% EBP) 0x4012A6

: SUB $ 0x8,% ESP 0x4012A9

: Movsbl 0xfffffffff (% EBP),% EAX // Byte 0xFF symbol extension is long byte to Eax, now is fffffffffff 0x4012ad

: push% EAX / / Fact 0x4012AE

: Push $ 0x401280 // This is not clear, what is the meaning, it is estimated to convert a format 0x4012B3

: Call 0x4029c0

// fffffff, the 10-encycloped value is -1. So it will be displayed on the screen after running, and print the printf in Eax to the screen 0x4012b8

: Add $ 0x10,% ESP MOVSBL is the Byte Sign Extension To Long Byte Instructions The process of disassembly Program_TWO is disassembled program_two.c program, which is the following analysis results as follows: ----------- --------------------------------- 0x401284: push% EBP 0X401285

: MOV% ESP,% EBP 0x401287

: SUB $ 0x8,% ESP 0x40128A

: And $ 0xffffffff0,% ESP 0x40128D

: MOV $ 0x0,% EAX 0x401292

: MOV% EAX, 0xfffffffff8 (% EBP) 0x401295

: MOV 0xfffffff8 (% EBP),% EAX 0x401298

: Call 0x402930 <_alloca> ​​0x40129d

: Call 0x401370 <__ main> 0x4012a2

: MOVB $ 0xFF, 0xffffffff (% EBP) // The OXFF here is the value of the ASCII code. Put the value in the variable P to the stack area of ​​0xfffffff (% EBP) 0x4012A6

: SUB $ 0x8,% ESP 0x4012A9

: MOV $ 0x0,% EAX 0x4012AE

: MOV 0xfffffffff (% EBP),% Al 0X4012B1

: push% EAX 0x4012B2

: Push $ 0x401280 // The address of the parameter, I don't know this, the GDB disassembly can't see anything, so I use GCC -S to generate assembly code, and I know, I'm seeing: --- -------------------------------------------------- -------------------------------------------------- ------------ .file "test.c" .def ___main; .scl 2; .Type 32; .ndef .text lc0: // Here is the address symbol of $ 0x401280. ASCII "% C / 0" .align 2 .globl _main .def _main; .scl 2; .Type 32; .endef _main: pushl% EBP MOVL% ESP,% EBP SUBL $ 8,% ESP Andl $ -16,% ESP MOVL $ 0,% EAX MOVL% EAX, -8 (% EBP) MOVL-8 (% EBP),% EAX CALL __ALLOCA CALL ___MAIN MOVB $ 9, -1 (% EBP) SUBL $ 8,% ESP MOVL $ 0,% EAX MOVB -1 (% EBP),% Al Pushl% EAX PUSHL $ lc0 // is really like this. He is a sign address of $ 0x401280 // If it is disassembled, it can't see Printf ("% d" .p); and Printf ("% c", p); difference, good GCC power. / / In this internal, the PrintF function, different processing according to different parameters, if it is% d, then print only the value of the P, ----------- -------------------------------------------------- -------------------------------------------------- ----- 0x4012B7

: Call 0x4029d0

/ / Press the format of the integer to display 0xFF on the screen, 0xFF's integer value is 255. So print 255 0x4012BC

: Add $ 0x10,% ESP ------------------------------------------------------------------------------------------------------ -------- * / Through the analysis of the above two programs, it is not difficult to find that the compiler is dealing with the number of symbols and no symbols, and does not do bytes when there is no symbol. The symbol extension of the byte, and the number of symbols need to be long-byte extensions, here is the form of 0xFF to 0xfffffffff, and here Printf ("% d", p); our program is the integer form of print 255 Not to print the value corresponding to the ASCII code. If printing the corresponding value, then Printf ("% c", c); this form. If we write #includeint main (int Argc, char * argv []) {unsigned char p = 255; Printf ("% c", p);} is% C, it is necessary to pass one check The table process is mainly to find the comparison table of the ASCII code, such as 0xFF, then find the 0xFF corresponding character value, Oxff corresponds to the Blank character. It is a space. [bloves 2004-01-16 20:14 Edit] Excerpted from Jdev Bloves 2004-01-16 18:46 http://ccb.77jj.com/User03/bbs/ccb/index.cgi 2004-01-16 18:46 Edit Article Reference Reply to check the author's information to the author to send a private message View the author's all post moderatures Delete article Authors Reply: How to distinguish between the compiler ... 2nd floor Bloves Jdev Community General Points: 123 Total number: 114 Post From: Bloves Compilation How to distinguish ... Excerpted from JDEV Bloves 2004-01-17 03:23 http://ccb.77jj.com/User03/bbs/ccb/index.cgi compiler How to distinguish between sign or no symbol? In a word, according to the lexical unsigned char or sign, if it is unsigned, the code does not include a symbol extension instruction in the code. If it is signed or direct char, then a symbol expansion instruction will be included. Why Will one equal to 128 another equal to -128? Because the second is a symbol, then the compiler will generate a symbol extension instruction, because no symbol can only indicate -128 ~ 127, when 127 1 is equal to the beginning of another segment, 2 is equal to -127, 3 is equal to -126. #include

INT main (void) {unsigned char A = 127; A = a 1; Printf ("a =% d / n", a);} The above program is unsigned, so the compiler does not make a symbol extension, so Output 128 Next: int main (void) {char A = 127; A = a 1; Printf ("a =% d / n", a);} Key to see these instructions: Call 0x401370 <__ main> MOVB $ 0x7f, 0xfffffff (% EBP) // 0x7f is equal to 10 en-127 LEA 0xfffffffff (% EBP),% EAX INCB (% EAX) // 127 1 is equal to 128 is also 16 credits 80, 2 10000000, pay attention to this is the number of symbols, and the symbol bit is 1 SUB $ 0x8,% ESP MOVSBL 0xFfffffffff (% EBP), the% EAX // Movsbl instruction is a symbolic long byte extension, that is, 1111111111111111111111110000000 // Because the symbol is 1, so in front of the first, 24 1//11 11111111111111111111111110000000 is 10 enrollment -128 #includint main (void) {Unsigned char A = 122; A = a 1; Printf ("a =% d / n", a) Compile and run the program, the program output is: 123 #include

INT main (void) {char A = 122; A = a 1; Printf ("A =% D / N", A);} Compile and run the program, the program output is: 123 Why do this result ? Does we analyze the symbols to get symbols? One of the reasons here is a symbol. Only more than 127 will result in a negative, that is, his symbol bit is 0, 127 1 = 128, is 1, so it will result in negative. Let's take a look at the assembly code: 0x4012A4

: MOVB $ 0x7a, 0xffffffff (% EBP) // 0x7a is 122, 2 enrollment 01111010, note that the symbol bit here is 0 0x4012A8

: LEA 0xfffffffff (% EBP),% EAX 0x4012AB

: Incb (% eax) // plus 1 0x4012AD

: SUB $ 0x8,% ESP 0x4012B0

: Movsbl 0xffffffff (% EBP),% EAX // The symbol expansion instruction is generated here, but why didn't it expand the negative number? // Because the symbol bit here is 0, the extended value is 122 1 = 123, so this does not output a negative number, and the nature of the GCC compiler generates a symbolic character and the unsigned characters are non-expanded instructions, extension. Symbol bit is 0 or 1 Machine: Intel 80x86 Complier: GCC 3.22 OS: Windows2k

转载请注明原文地址:https://www.9cbs.com/read-91869.html

New Post(0)