This is the author's study notes in the process of learning the X86 assembly, which is inevitable to have errors and omissions. Welcome to correct. The author will modify the error at any time and publish the new version on its own Blog site. 1. Compilation Environment OS: Solaris 9 x86 Compiler: GCC 3.3.2 Linker: Solaris Link Editors 5.x Debug Tool: MDB Editor: VI Note: About the installation and setting of the compilation environment, you can refer to the article: Development Environment on Solaris Installation And set. MDB is the kernel debug tool provided by Solaris, where it uses it to make an anti-assessment and assembly language commissioning tool. If you can use GDB in the Linux platform, it can be used to disassemble and debug 2. The simplest C code analysis is a simplified problem to analyze the assembly code generated by the simplest C code: # vi Test1.c int main () {return 0;} Compile the program, generate binary: # gcc test1.c -o test1 # file test1 test1: Elf 32-bit lsb executable 80386 Version 1, DynamicalLinked, Not Stripped Test1 is a 32-bit small end (Little Endian) The executable file, dynamic link and the symbol table are not removed. This is exactly the typical executable file format of the UNIX / Linux platform. Using MDB disassembly, you can observe the generated assembly code: # MDB Test1 Loading Modules: [libc.so.1]> main :: disney ---> Continued main function, MDB command general format is
:: DIS Main: pushl% EBP ---> EBP register content stack, the list of the top-level call function for saving the main call function of the Main function Main 1: MOVL% ESP,% EBP ---> ESP value is assigned to EBP, set the main function Stack base main 3: Subl $ 8,% ESP Main 6: Andl $ 0xF0,% ESP Main 9: MOVL $ 0,% EAX Main 0xe: Subl% Eax,% ESP Main 0x10: MOVL $ 0,% EAX ---> Set function return value 0 main 0x15: Leave ---> Assign the EBP value to the base address of the upper grade function stack in the POP previous stack to EBP, restore the original stack base main 0x16: Ret ---> main function returns, return to superior calls> Note: The assembly language syntax formats here are very different from Intel's manual, and UNIX / Linux uses AT & T assembly format as the syntax format of assembly language. If you want to know the AT & T assembly Reference article: Linux AT & T assembly language development guide question: Who calls the main function? From the C language level, the main function is a program's starting entry point, and in fact, the entry point of the ELF executable is not main but _start.MDB can also disassemble _Start:> _Start :: Dis ---> From _Start's address start anti-assessment _Start: Pushl $ 0 _Start 2: Pushl $ 0 _Start 4: MOVL% ESP,% EBP _START 6: Pushl% EDX _START 7: MOVL $ 0x80504B0,% EAX _Start 0xc: testl% Eax,% EAX _Start 0xe: JE 0xF <_Start 0x1D> _Start 0x10: Pushl $ 0x80504b0 _start 0x15: Call -0x75 < Atexit> _Start 0x1a: Add1 $ 4,% ESP _Start 0x1d: MOVL $ 0x8060710,% EAX _Start 0x22: Testl% Eax,% EAX _Start 0x24: JE 7 <_Start 0x2b> _Start 0x26: Call -0x86
The growth direction of the stack is to increase from the high address to the low address, and add in bytes. 2) EBP is the pointer of the stack base, always pointing to the bottom of the stack (high address), ESP is a stack pointer, always pointing to the top (low address). 3) When a long-type data is used, the data is pressed into the stack in bytes, and the data is stored in the ESP-1, ESP-2, ESP-3, ESP-4, from high to low. . 4) POP a long-type data, the process is in contrast to the PUSH, sequentially pop up ESP-4, ESP-3, ESP-2, ESP-1 from the stack, put a 32-bit register. 5) The CALL instruction is used to call a function or process. At this time, the next instruction address will be pressed into the stack to return to the lower instruction when returning. 6) The RET instruction is used to return from a function or process. The lower strip of the previous call will pop up from the stack to the EIP register, and the program goes to the CALL 7) Enter is a stack to establish the current function Frame, that is, the following two instructions: Pushl% EBP MOVL% ESP,% EBP Leave is a stack frame that releases the current function or process, ie, equivalent to the following two instructions: MOVL EBP ESP Popl EBP If it is negative, there are many functions. At the time of the function enters and return, it is found that there is a compilation statement similar to the following form: Pushl% EBP ---> EBP register content stack, the list of the top-level call function of the Main function, MOVL% ESP,% EBP - -> ESP value is assigned to EBP, set the stack of main functions ........... ----> The above two instructions are equivalent to Enter 0,0 ........ ... Leave ---> Assign the EBP value to the base address of the higher-level function stack in the POP previous stack to the EBP, restore the original stack base return ---> main function returns, return to the superior call these statements is The stack frame for creation and release a function or process. The original compiler will automatically insert the statement created and release the stack framework at the function portal and the exit.
When the function is called: 1) EIP / EBP becomes the boundary function of the new function stack is called, the EIP returns the EIP first pressed into the stack; when the stack framework, the EBP of the superior function stack is pressed into the stack, and the EIP is connected to the EIP Boundary 2) EBP becomes a stack frame pointer STP, used to indicate the boundary stack frame of the new function stack, the content of the stack points to the stack of EBP is the EBP of the last function stack, you can imagine, pass EBP It can be traversed by the stack of the layer call function, and the debugger is using this feature 3) ESP always as a stack pointer to the top of the stack, used to assign the stack space stack allocation space to the function local variable The statement is usually to subtract a constant value to the ESP, for example, allocating a integer data is the parameter transfer and local variable access of the ESP-4 4) function. You can implement a stack of the current function by STP EBP. Address, parameters, and local variables are usually as follows: 8 XX (% EBP): Access to function portal parameters: Function local variable access, such as function a call function B, function B call function C , The function stack framework and call relationship is shown in the following figure: ------------------ ----> High Address | EIP (superior function returned Address) | ---------------------- -> | EBP (EBP for superior functions) | - <------ Current Function A EBP (ie STP Frame Pointer) | -------------------- -> Offix A | | Local Variables | | | ........ | - <------ ESP points to the new allocated local variables, local variables can be accessed by the EBP-offset A of A, F - --------------------- | r | arg n (Nth parameters of the function B) | | a ---------- ------------ | M | Arg. (The first parameter) | | E ------------------- --- | | | ARG 1 (1st parameter of the function B) | | O ---------------------- | f | arg 0 ( The 0th parameter of the function b) | - <------ B function parameters can be accessed by the EBP offset B of B | --------------- ------- -> Offset B | A | EIP (A function Return Address) | | | ---------------------- --- | EBP (EBP of a Function) | <- <------ EBP (ie STP frame pointer) ------------------------------------------------------------------------------------------------------------------------------------ | ......... | | <------ ESP points to the new partial variable -------------------- | | | Arg N (Nth parameters of the function C) | | ---------------------- | | arg. (The "function C) Parameters | | ---------------------- -> Frame of B | Arg 1 (the first parameter of the function C) | | | -------------------- | | Arg 0 (the 0th parameter of the function C) | |
---------------------- | | EIP (Back address of the B function) | | -------------- -------- | -> | EBP (EBP of B "EBP) | --- <------ EBP (ie, STP Framework Pointer) | --- ------------------- | | | | ........ | <------ ESP pointing to function C Allocated local variables | -------------------- ----> low address frame of c to analyze the remaining statements in TEST1 disassembly results Meaning: # MDB Test1 Loading Modules: [Libc.so.1]> Main :: Dis ---> Continued Main Function Main: Pushl% EBP Main 1: MOVL% ESP,% EBP ---> Create Stack Frame (Stack Frame) Main 3: Subl $ 8,% ESP ---> Assign 8 bytes Stack Space MAIN 6: Andl $ 0xF0,% ESP ---> Replacement Address 16 bytes MAIN 9: MOVL $ 0,% EAX ---> meaningless main 0xe: Subl% EAX,% ESP ---> meaningless main 0x10: MOVL $ 0,% EAX ---> Set the main function return value main 0x15: Leave ---> Undo Stack Frame (Stack Frame) Main 0x16: RET ---> Main function returns> The following two sentences seem to be meaningless, is the same? MOVL $ 0,% EAX SUBL% EAX,% ESP is recompiled with GCC O2 Optimization to recompile TEST1.C: # gcc -o2 test1.c -o test1 # mdb test1> main :: dis be: pushl% EBP Main 1 : MOVL% ESP,% EBP Main 3: Subl $ 8,% ESP Main 6: Andl $ 0xf0,% ESP Main 9: XORL% EAX,% EAX ---> Set Main Return Value, use xorl too or instruction To make EAX 0 main 0xb: Leave main 0xc: return> new anti-assembly results are simpler than the initial results, and since the statement that is considered useless is optimized, further verify the previous guess. Tip: Some statements generated by the compiler may not be used in actual semantics, and can remove these statements with the optimization options. Question: Why use XORL to set the value of EAX? Note that in the optimized code, the setting of the EAX return value is made by MOVL $ 0,% EAX to XORL% EAX,% Eax, because IA32 instructions, XORL has a higher running speed than MOVL.
Concept: Stack Aligned Stack The following statement is the following statement? SUBL $ 8,% ESP ANDL $ 0xF0,% ESP ---> Based on the low 4 bits by ANDL, ensure that the stack address 16-byte alignment surface, the most direct consequence of this statement is to make the ESP's address 4 digits For 0, the 16-byte alignment, then why do you do this? It turns out that some instructions of the IA32 series CPU have a more running speed at 4, 8, 16 bytes, so the GCC compiler is the running speed generating code on IA32, and the resulting code is 16-byte. Aligned and the meaning of the ANDL $ 0xf0,% ESP is obvious, then SUBL $ 8,% ESP, is it necessary? Before entering the main function, the stack is 16-byte aligned words, then after entering the main function, after the EIP is pressed into the stack, the last 4 bit of the stack address must be 0100, and the ESP-8 just makes the last 4-bit address. 0. It seems that this is also to ensure that the stack is 16-byte alignment. If you check the GCC manual, you will find the parameter settings for the stack: -mpreferred-stack-boundary = n ---> I hope the stack is aligned according to the N times of 2, the value of N is 2- 12 By default, N is equal to 4, that is, by default, GCC is 16-byte alignment to accommodate the requirements of most of IA32. Let us use -mpreferred-stack-boundary = 2 to remove the stack alignment instruction: # gcc -mpreferred-stack-boundary = 2 TEST1.C -O TEST1> Main :: DIS Main: pushl% EBP Main 1: MOVL% ESP % EBP Main 3: MOVL $ 0,% EAX Main 8: Leave Main 9: Ret> You can see that the stack is not in the command, because the stack of IA32 is 4-byte alignment, no need to use additional The instruction is aligned. So, is the stack frame pointer STP? # gcc -mpreferred-stack-boundary = 2 -FOTI-frame-point test1.c -o test> main :: dis be: MOVL $ 0,% EAX Main 5: Ret> can be known, -fomit-frame-pointer Can remove STP. Question: What shortcomings have you removed after STP? 1) Increasing the adjustment difficulty Since the STP is used in the command of the debugger backtrace, there is no STP that the debug instruction cannot be used. 2) Reduce the assembly code readability function parameters and local variables. In the case of no EBP, it can only be accessed by XX (ESP), and it is difficult to distinguish between two ways, reduce the program's readability. Sex.